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REMARKS 

Pending Claims 

Claims 1-18, 20, 23, and 29 are currently pending. The Examiner has noted that claims 3-16, 
20, 23, 28, and 29 have been withdrawn from consideration. However, Applicants respectfully remind 
the Examiner that claims 4, 5, 19, 21, 22, 24-27, and 30-45 were cancelled at the time that the 
application was filed. Accordingly, the claims withdrawn from consideration are 3, 6-16, 20, 23, 28 
and 29. Claims 1-2 and 17-18 are currently under examination. Applicants submit that the cancelled 
claims were included in the application as filed in the interest of providing notice to the public of certain 
specific subject matter intended to be claimed, and were intended to be cancelled at the time of filing 
this application in the interest of reducing filing costs. Applicants expressly state that these claims were 
not cancelled for reasons related to patentability, and are in fact fully supported by the specification as 
filed. Applicants expressly reserve the right to reinstate these claims, or to add other claims during 
prosecution of this application, or a continuation or divisional application. Applicants expressly do not 
disclaim the subject matter of any invention disclosed herein which is not set forth in the instantly filed 
claims. 

Elections and Restrictions 

In Item 1 of the outstanding Office Action, the Examiner acknowledges Applicants' election 
with traverse of Group I in Applicants' Response to Restriction Requirement filed October 21, 2002, 
but maintains the requirement and makes it final, asserting that the examination of additional groups 
would prove to be a burdensome search. Applicants again respectfully remind the Examiner that the 
method claims of Groups HI, VII, and X of the Restriction Requirement (i.e., claims 9-10, 20, and 23) 
are entitled to rejoinder upon the allowance of a product claim per the Commissioner's Notice in the 
Official Gazette of March 26, 1995, entitled "Guidance on Treatment of Product and Process Claims in 
light of In re Ochiai, In re Brouwer and 35 U.S.C. § 103(b)" which sets forth the rules, upon allowance 
of product claims, for rejoinder of process claims covering the same scope of products. See also 
M.P.E.P. 821.04. 
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Su pport for the Amendments 

This amendment incorporates explicitly disclosure implicitly present, based on, e.g. disclosure in 
the specification. Claim 1 has been amended so that part (d) now reads, "an immunogenic fragment of 
a polypeptide having an amino acid sequence of SEQ ED NO:l, said fragment comprising at least 15 
contiguous amino acid residues." No new matter has been added by this amendments. Support for this 
amendment may be found in the Specification at p. 51, line 1. Entry of the amendment is respectfully 
requested. 

The Rejection 

Claims 1, 2, 17 and 18 stand rejected under 35 U.S.C. § 101 based on the allegation that the 
claimed invention lacks patentable utility. The rejection alleges in particular that: 

• there is no well-established, specific and substantial utility for the claimed polypeptide as 
different stomatin-like proteins would have different functions and the skilled artisan would have 
to determine the function of this particular polypeptide in order to determine how to use it. 

• the claimed invention is not supported by either a substantial and specific asserted utility or a 
well established utility. The specification discloses no uses for the broadly claimed 
polypeptides. A specific utility is one that is particular to the subject matter claimed, while a 
substantial utility is one that defines a "real world" use. Utilities that require or constitute 
carrying out further research to identify or reasonably confirm a "real world" context of use are 
not substantial utilities. 

Utility Rejection - 35 USC § 101 

The rejection of claims 1, 2, 17 and 18 is improper, as the inventions of those claims 
have a patentable utility as set forth in the instant specification, and/or a utility well-known to 
one of ordinary skill in the art. 

The invention at issue, identified in the patent application as novel integral membrane protein, 
abbreviated as IMP, is a polypeptide sequence encoded by a gene that is expressed in prostate tumor, 
breast tumor, and pancreatic tumor tissues of humans. The novel polypeptide is demonstrated in the 
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r specification to be a member of the class of integral membrane proteins, whose biological functions 

include the regulation of ion channel activity (Specification at p. 3, lines 12-21). As such, the claimed 
invention has numerous practical, beneficial uses in toxicology testing, drug development, and the 
diagnosis of disease, none of which require knowledge of how the polypeptide actually functions. The 
claimed invention also can be used as tissue or tumor marker (Specification at p. 15, lines 12-16). 

The fact that the claimed polypeptide is a memberoLtheJi^ memb rane protein family alone 
demonstrates utility. Each of the members of this class, regardless of their particular functions, are 
useful. There is no evidence that any member of this class of polypeptides, let alone a substantial 
number of them, would not have some patentable utility. It follows that there is a more than substantial 
likelihood that the claimed polypeptide also has patentable utility, regardless of its actual function. The 
law has never required a patentee to prove more. 

There is, in addition, direct proof of the utility of the claimed invention. Applicants submit with 
this response the unexecuted Declaration of Furness (executed version will be forwarded to the 
Examiner as soon as possible) describing some of the practical uses of the claimed invention in gene 
and protein expression monitoring applications as they would have been understood at the time of the 
patent application. The Furness Declaration describes, in particular, how the claimed polypeptide can 
be used in protein expression analysis techniques such as 2-D PAGE gels and western blots. Using the 
claimed invention with these techniques, persons of ordinary skill in the art can better assess, for 
example, the potential toxic effect of a drug candidate. (Furness Declaration fl2 (b) ). 

The Examiner contends that the claimed polypeptide cannot be useful without precise 
knowledge of its function. But the law never has required knowledge of biological function to prove 
utility. It is the claimed invention's uses, not its functions, that are the subject of a proper analysis under 
the utility requirement. 

In any event, as demonstrated by the Furness Declaration, the person of ordinary skill in the art 
can achieve beneficial results from the claimed polypeptide in the absence of any knowledge as to the 
precise function of the protein. The uses of the claimed polypeptide for gene expression monitoring 
applications including toxicology testing are in fact independent of its precise function. 
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The Office Action is replete with arguments made and positions taken in a misplaced attempt to 
justify the rejections of the claims under 35 U.S.C. §§ 101 and 1 12. The Examiner's positions and 
arguments include that the invention (i.e., the SEQ ED NO:l polypeptide) is not supported by either a 
"specific asserted utility" or a "well established utility" (p. 3 Item 6), "...does not teach a relationship to 
any specific disease or establish any involvement in the etiology of any specific disease" (p. 3 Item 6, 
also p. 6, first paragraph) or is otherwise insufficient to constitute substantial, specific and credible 
utilities for the SEQ ID NO:l polypeptide (Office Action, e.g., p. 6). 

Under the circumstances, Applicants are submitting with this Response a Declaration of 
Furness under 37 C.F.R. § 1.132 (the Furness Declaration). As we will show, the Furness Declaration 
shows the many substantial reasons why the Examiner's positions and arguments with respect to the use 
of the SEQ ID NO:l polypeptide are without merit. 

I. The Applicable Legal Standard 

To meet the utility requirement of sections 101 and 112 of the Patent Act, the patent applicant 

need only show that the claimed invention is "practically useful," Anderson v. Natta, 480 F.2d 1392, 

1397, 178 USPQ 458 (CCPA 1973) and confers a "specific benefit" on the public. Brenner v. 

Manson, 383 U.S. 519, 534-35, 148 USPQ 689 (1966). As discussed in a recent Court of Appeals 

for the Federal Circuit case, this threshold is not high: 

An invention is "useful" under section 101 if it is capable of providing some identifiable 
benefit. See Brenner v. Manson, 383 U.S. 519, 534 [148 USPQ 689] (1966); 
Brooktree Corp. v. Advanced Micro Devices, Inc., 977 F.2d 1555, 1571 [24 
USPQ2d 1401] (Fed. Cir. 1992) ("to violate Section 101 the claimed device must be 
totally incapable of achieving a useful result"); Fuller v. Berger, 120 F. 274, 275 (7th 
Cir. 1903) (test for utility is whether invention "is incapable of serving any beneficial 
end"). 

Juicy Whip Inc. v. Orange Bang Inc., 51 USPQ2d 1700 (Fed. Cir. 1999). 

While an asserted utility must be described with specificity, the patent applicant need not 
demonstrate utility to a certainty. In Stiftung v. Renishaw PLC, 945 F.2d 1 173, 1 180, 20 USPQ2d 
1094 (Fed. Cir. 1991), the United States Court of Appeals for the Federal Circuit explained: 
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An invention need not be the best or only way to accomplish a certain result, and it 
need only be useful to some extent and in certain applications: "[T]he fact that an 
invention has only limited utility and is only operable in certain applications is not 
grounds for finding lack of utility." Envirotech Corp. v. Al George, Inc., 730 F.2d 
753, 762, 221 USPQ 473, 480 (Fed. Cir. 1984). 

The specificity requirement is not, therefore, an onerous one. If the asserted utility is described 
so that a person of ordinary skill in the art would understand how to use the claimed invention, it is 
sufficiently specific. See Standard Oil Co. v. Montedison, S.p.a., 212 U.S.P.Q. 327, 343 (3d Cir. 
1981). The specificity requirement is met unless the asserted utility amounts to a "nebulous expression" 
such as "biological activity" or "biological properties" that does not convey meaningful information 
about the utility of what is being claimed. Cross v. iizuka, 753 F.2d 1040, 1048 (Fed. Cir. 1985). 

In addition to conferring a specific benefit on the public, the benefit must also be "substantial." 
Brenner, 383 U.S. at 534. A "substantial" utility is a practical, "real-world" utility. Nelson v. Bowler, 
626 F.2d 853, 856, 206 USPQ 881 (CCPA 1980). 

If persons of ordinary skill in the art would understand that there is a "well-established" utility 
for the claimed invention, the threshold is met automatically and the applicant need not make any 
showing to demonstrate utility. Manual of Patent Examination Procedure at § 706.03(a). Only if there 
is no "well-established" utility for the claimed invention must the applicant demonstrate the practical 
benefits of the invention. Id. 

Once the patent applicant identifies a specific utility, the claimed invention is presumed to 
possess it. In re Cortright, 165 F.3d 1353, 1357, 49 USPQ2d 1464 (Fed. Cir. 1999); In re Brana, 
51 F.3d 1560, 1566; 34 USPQ2d 1436 (Fed. Cir. 1995). In that case, the Patent Office bears the 
burden of demonstrating that a person of ordinary skill in the art would reasonably doubt that the 
asserted utility could be achieved by the claimed invention. Id. To do so, the Patent Office must 
provide evidence or sound scientific reasoning. See In re Longer, 503 F.2d 1380, 1391-92, 183 
USPQ 288 (CCPA 1974). If and only if the Patent Office makes such a showing, the burden shifts to 
the applicant to provide rebuttal evidence that would convince the person of ordinary skill that there is 
sufficient proof of utility. Brana, 51 F.3d at 1566. The applicant need only prove a "substantial 
likelihood" of utility; certainty is not required. Brenner, 383 U.S. at 532. 
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II. Toxicology testing, use as a tissue or tumor marker, regulation of membrane 
conductance, and regulation of ion channel activity are sufficient utilities under 35 U.S.C. 
§§ 101 and 112, first paragraph 

The claimed invention meets all of the necessary requirements for establishing a credible utility 
under the Patent Law: There are "well-established" uses for the claimed invention known to persons of 
ordinary skill in the art, and there are specific practical and beneficial uses for the invention disclosed in 
the patent application's specification. These uses are explained, in detail, in the Furness Declaration 
accompanying this response. Objective evidence, not considered by the Patent Office, further 
corroborates the credibility of the asserted utilities. 

A. The claimed polypeptide's membership in the integral membrane protein family 
demonstrates utility 

Because there is a substantial likelihood that the claimed IMP is a member of the family of 
polypeptides known as integral membrane proteins, the members of which are indisputably useful, there 
is by implication a substantial likelihood that the claimed polypeptide is similarly useful. Applicants need 
not show any more to demonstrate utility. In re Brana, 51 F.3d at 1567. 

It is undisputed that the claimed polypeptide is a protein having the sequence shown as SEQ ID 
NO: 1 in the patent application and referred to as IMP in that application. Applicants have 
demonstrated by more than reasonable probability that IMP is a member of the integral membrane 
protein family, and that the integral membrane protein of proteins includes stomatin and stomatin-like 
proteins, each of which regulate membrane conductance. IMP has structural homology with stomatin. 

The Examiner must accept Applicants' assertion that the claimed polypeptide is a member of 
integral membrane protein family and that utility is credible to a reasonable probability unless the 
Examiner can demonstrate through evidence or sound scientific reasoning that a person of ordinary skill 
in the art would doubt utility. See In re Lunger, 503 R2d 1380, 1391-92, 183 USPQ 288 (CCPA 
1974). The Examiner has not provided sufficient evidence or sound scientific reasoning to the contrary. 

Nor has the Examiner provided any evidence that any member of the integral membrane protein 
family, let alone a substantial number of those members, is not useful. In such circumstances the only 
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reasonable inference is that the claimed polypeptide must be, like the other members of the integral 
membrane protein family, useful. 

B. The Office Action failed to demonstrate that a person of ordinary skill in the art 
would reasonably doubt the utility of the claimed invention 

Based principally on citations to scientific literature identifying some of the difficulties involved in 
predicting protein function, the Office Action rejected the pending claims on the ground that the 
Applicants cannot impute utility to the claimed invention based on the homology between the encoded 
polypeptide, IMP, and another polypeptide. The Office Action's rejection is both incorrect as a matter 
of fact and as a matter of procedural law. 

While the Office Action has cited literature identifying some of the difficulties that may be 
involved in predicting protein function, none suggests that functional homology cannot be inferred by a 
reasonable probability in this case. Importantly, none contradicts Bork's later findings that there is a 
70% accuracy rate for bioinformatics-based predictions in general, and a 90% accuracy rate for the 
prediction of functional features by homology. Bork, Genome Research 10:398-400 (2000). At most, 
these articles individually and together stand for the proposition that it is difficult to make predictions 
about function with certainty. The standard applicable in this case is not, however, proof to certainty, 
but rather proof to reasonable probability. 

The literature cited in the Office Action may show that Applicants cannot prove function by 
homology with certainty, but Applicants need not meet such a rigorous standard of proof. Under the 
applicable law, once the Applicants demonstrate a prima facie case of homology, the Office must 
accept the assertion of utility to be true unless the Office comes forward with evidence showing a 
person of ordinary skill would doubt the asserted utility could be achieved by a reasonable probability. 
See In re Brana y 51 F.3d at 1566; In re hanger, 503 F.2d 1380, 1391-92, 183 USPQ 288 (CCPA 
1974). The Office has not made such a showing and, as such, the Office Action's rejection should be 
withdrawn. 



107519 



12 



09/898,216 



Docket No.: PF-0181-2 CON 

C. The uses of IMP for toxicology testing, drug discovery, and disease diagnosis 
are practical uses that confer "specific benefits" to the public 

The claimed invention has specific, substantial, real-world utility by virtue of its use in toxicology 
testing, drug development and disease diagnosis through gene expression profiling. These uses are 
explained in detail in the accompanying Furness Declaration, the substance of which is not rebutted by 
the Patent Examiner. There is no dispute that the claimed invention is in fact a useful tool in two- 
dimensional polyacrylamide gel electrophoresis ("2-D PAGE") analysis and western blots used to 
monitor protein expression and assess drug toxicity. 

The instant application is a continuation application of and claimed priority to United States 
patent application Serial No. 09/095,351 filed on June 9, 1998 (hereinafter "the Hillman 4 351 
application"), which in turn was a divisional application of and claimed priority to United States patent 
application Serial No. 08/781,562 filed on January 9, 1997 (hereinafter "the Hillman '562 
application"), all having the identical specification. 

In his Declaration, Mr. Furness explains the many reasons why a person skilled in the art who 
read the Hillman '562 application on January 9, 1997 would have understood that application to 
disclose the claimed polypeptide to be useful for a number of gene and protein expression monitoring 
applications, e.g., in 2-D PAGE technologies, in connection with the development of drugs and the 
monitoring of the activity of such drugs. (Furness Declaration at, e.g., fl[ll-13). Much, but not all, of 
Mr. Furness' explanation concerns the use of the claimed polypeptide in the creation of protein 
expression maps using 2-D PAGE. 

2-D PAGE technologies were developed during the 1980's. Since the early 1990's, 2-D 
PAGE has been used to create maps showing the differential expression of proteins in different cell 
types or in similar cell types in response to drugs and potential toxic agents. Each expression pattern 
reveals the state of a tissue or cell type in its given environment, e.g., in the presence or absence of a 
drug. By comparing a map of cells treated with a potential drug candidate to a map of cells not treated 
with the candidate, for example, the potential toxicity of a drug can be assessed. (Furness Declaration 
atlll.) 
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The claimed invention makes 2-D PAGE analysis a more powerful tool for toxicology and drug 

efficacy testing. A person of ordinary skill in the art can derive more information about the state or 

states or tissue or cell samples from 2-D PAGE analysis with the claimed invention than without it. As 

Mr. Furness explains: 

In view of the Hillman '562 application, the Wilkins article, and other related pre- 
January 1997 publications, persons skilled in the art on January 9, 1997 clearly would 
have understood the Hillman '562 application to disclose the SEQ ID NO:l 
polypeptide or the antibody to SEQ ID NO:l polypeptide to be useful in 2-D PAGE 
analyses for the development of new drugs and monitoring the activities of drugs for 
such purposes as evaluating their efficacy and toxicity .... (Furness Declaration, %10) 

* * * 

Persons skilled in the art would appreciate that a 2-D PAGE map that utilized the SEQ 
ID NO:l polypeptide sequence would be a more useful tool than a 2-D PAGE map 
that did not utilize this protein sequence in connection with conducting protein 
expression monitoring studies on proposed (or actual) drugs for treating cancers for 
such purposes as evaluating their efficacy and toxicity. (Furness Declaration, fl2) 

Mr. Furness' observations are confirmed in the literature published before the filing of the 

patent application. Wilkins, for example, describes how 2-D gels are used to define proteins present in 

various tissues and measure their levels of expression, the data from which is in turn used in databases: 

For proteome projects, the aim of [computer-aided 2-D PAGE] analysis ... is to 
catalogue all spots from the 2-D gel in a qualitative and if possible quantitative manner, 
so as to define the number of proteins present and their levels of expression. Reference 
gel images, constructed from one or more gels, for the basis of two-dimensional gel 
databases. (Wilkins, Tab 1, p. 26). 



D. The use of proteins expressed by humans as tools for toxicology testing, drug 
discovery, and the diagnosis of disease is now 'Svell-established" 

The technologies made possible by expression profiling using polypeptides are now well- 
established. The technical literature recognizes not only the prevalence of these technologies, but also 
their unprecedented advantages in drug development, testing and safety assessment. These 
technologies include toxicology testing, as described by Furness in his Declaration. 
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Toxicology testing is now standard practice in the pharmaceutical industry. See, e.g., John C. 

Rockett, et. al., Differential gene expression in drug metabolism and toxicology: practicalities, 

problems, and potential , Xenobiotica 29:655-691 (July 1999) (Reference No. 2): 

Knowledge of toxin-dependent regulation in target tissues is not solely an academic 
pursuit as much interest has been generated in the pharmaceutical industry to harness 
this technology in the early identification of toxic drug candidates, thereby shortening the 
developmental process and contributing substantially to the safety assessment of new 
drugs. ((Reference No. 2), page 656) 
To the same effect are several other scientific publications, including Emile F. Nuwaysir, et al., 

Microarravs and Toxicoloev: The Advent of Toxicogenomics , Molecular Carcinogenesis 24:153-159 

(1999) (Reference No. 3); Sandra Steiner and N. Leigh Anderson, Expression profiling in toxicology - 

- potentials and limitations . Toxicology Letters 112-13:467-471 (2000) (Reference No. 4). 

The more genes - and, accordingly, the polypeptides they encode ~ that are available for use 
in toxicology testing, the more powerful the technique. Control genes are carefully selected for their 
stability across a large set of array experiments in order to best study the effect of toxicological 
compounds. See attached email from the primary investigator of the Nuwaysir paper, Dr. Cynthia 
Afshari, to an Incyte employee, dated July 3, 2000, as well as the original message to which she was 
responding (Reference No. 5) Thus, there is no expressed gene which is irrelevant to screening for 
toxicological effects, and all expressed genes have a utility for toxicological screening. 

In fact, the potential benefit to the public, in terms of lives saved and reduced health care costs, 
are enormous. Recent developments provide evidence that the benefits of this information are already 
beginning to manifest themselves. Examples include the following: 

• In 1999, CV Therapeutics, an Incyte collaborator, was able to use Incyte gene 
expression technology, information about the structure of a known transporter gene, 
and chromosomal mapping location, to identify the key gene associated with Tangier 
disease. This discovery took place over a matter of only a few weeks, due to the 
power of these new genomics technologies. The discovery received an award from the 
American Heart Association as one of the top 10 discoveries associated with heart 
disease research in 1999. 

• In an April 9, 2000, article published by the Bloomberg news service, an Incyte 
customer stated that it had reduced the time associated with target discovery and 
validation from 36 months to 18 months, through use of Incyte' s genomic information 
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database. Other Incyte customers have privately reported similar experiences. The 
implications of this significant saving of time and expense for the number of drugs that 
may be developed and their cost are obvious. 

In a February 10, 2000, article in the Wall Street Journal, one Incyte customer stated 
that over 50 percent of the drug targets in its current pipeline were derived from the 
Incyte database. Other Incyte customers have privately reported similar experiences. 
By doubling the number of targets available to pharmaceutical researchers, Incyte 
genomic information has demonstrably accelerated the development of new drugs. 



Because the Patent Examiner failed to address or consider the "well-established" utilities for the 
claimed invention in toxicology testing, drug development, and the diagnosis of disease, the Examiner's 
rejections should be overturned regardless of their merit. 

E. Objective evidence corroborates the utilities of the claimed invention 

There is in fact no restriction on the kinds of evidence a Patent Examiner may consider in 
determining whether a "real-world" utility exists. "Real-world" evidence, such as evidence showing 
actual use or commercial success of the invention, can demonstrate conclusive proof of utility. 
Raytheon v. Roper, 220 USPQ2d 592 (Fed. Cir. 1983); Nestle v. Eugene, 55 F.2d 854, 856, 12 
USPQ 335 (6th Cir. 1932). Indeed, proof that the invention is made, used or sold by any person or 
entity other than the patentee is conclusive proof of utility. United States Steel Corp. v. Phillips 
Petroleum Co., 865 F.2d 1247, 1252, 9 USPQ2d 1461 (Fed. Cir. 1989). 

Over .the past several years, a vibrant market has developed for databases containing all 
expressed genes (along with the polypeptide translations of those genes). (Note that the value in these 
databases is enhanced by their completeness, but each sequence in them is independently valuable.) 
The databases sold by Applicants' assignee, Incyte, include exactly the kinds of information made 
possible by the claimed invention, such as tissue and disease associations. Incyte sells its database 
containing the claimed sequence and millions of other sequences throughout the scientific community, 
including to pharmaceutical companies who use the information to develop new pharmaceuticals. 

Both Incyte' s customers and the scientific community have acknowledged that Incyte' s 
databases have proven to be valuable in, for example, the identification and development of drug 
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candidates. As Incyte adds information to its databases, including the information that can be generated 
only as a result of Incyte' s discovery of the claimed polypeptide, the databases become even more 
powerful tools. Thus the claimed invention adds more than incremental benefit to the drug discovery 
and development process. 

III. The Patent Examiner's Rejections Are Without Merit 

Rather than responding to the evidence demonstrating utility, the Examiner attempts to dismiss it 
altogether by arguing that the disclosed and well-established utilities for the claimed polypeptide are not 
"specific" or "well-established" utilities. (Office Action at p. 3.) The Examiner is incorrect both as a 
matter of law and as a matter of fact. 

A. The Precise Biological Role Or Function Of An Expressed Polypeptide Is Not 
Required To Demonstrate Utility 

The Patent Examiner's primary rejection of the claimed invention is based on the ground that, 
without information as to the precise biological role ("what IMP is, how it functions," as well as its 
"relationship to any specific disease;" Office Action at p. 3.) of the claimed invention, the claimed 
invention's utility is not sufficiently specific. 

It may be that specific and substantial interpretations and detailed information on biological 
function are necessary to satisfy the requirements for publication in some technical journals, but they are 
not necessary to satisfy the requirements for obtaining a United States patent. The relevant question is 
not, as the Examiner would have it, whether it is known how or why the invention works, In re 
Cortwright, 165 F.3d 1353, 1359 (Fed. Cir. 1999), but rather whether the invention provides an 
"identifiable benefit" in presently available form. Juicy Wliip Inc. v. Orange Bang Inc., 185 F.3d 
1364, 1366 (Fed. Cir. 1999). If the benefit exists, and there is a substantial likelihood the invention 
provides the benefit, it is useful. There can be no doubt, particularly in view of the Furness Declaration 
(at, e.g., ffl 10-13), that the present invention meets this test. 
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The threshold for determining whether an invention produces an identifiable benefit is low. 
Juicy Whip, 185 F.3d at 1366. Only those utilities that are so nebulous that a person of ordinary skill 
in the art would not know how to achieve an identifiable benefit and, at least according to the PTO 
guidelines, so-called "throwaway" utilities that are not directed to a person of ordinary skill in the art at 
all, do not meet the statutory requirement of utility. Utility Examination Guidelines, 66 Fed. Reg. 1092 
(Jan. 5, 2001). 

Knowledge of the biological function or role of a biological molecule has never been required to 

show real-world benefit. In its most recent explanation of its own utility guidelines, the PTO 

acknowledged as much (66 F.R. at 1095): 

[Tjhe utility of a claimed DNA does not necessarily depend on the function of the 
encoded gene product. A claimed DNA may have specific and substantial utility 
because, e.g., it hybridizes near a disease-associated gene or it has gene-regulating 
activity. 

By implicitly requiring knowledge of biological function for any claimed polypeptide, the 
Examiner has, contrary to law, elevated what is at most an evidentiary factor into an absolute 
requirement of utility. Rather than looking to the biological role or function of the claimed invention, the 
Examiner should have looked first to the benefits it is alleged to provide. 

B. Membership in a Class of Useful Products Can Be Proof of Utility 

Despite the uncontradicted evidence that the claimed polypeptide is a member of the integral 
membrane protein family, whose members indisputably are useful, the Examiner refused to impute the 
utility of the members of the integral membrane protein family to IMP. In the Office Action at p.6, the 
Patent Examiner takes the position that unless Applicants can identify which particular biological 
function within the class of integral membrane protein is possessed by IMP, utility cannot be imputed. 
To demonstrate utility by membership in the class of integral membrane proteins, the Examiner would 
require that all integral membrane proteins possess a "common" utility. 

There is no such requirement in the law. In order to demonstrate utility by membership in a 
class, the law requires only that the class not contain a substantial number of useless members. So long 
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as the class does not contain a substantial number of useless members, there is sufficient likelihood that 
the claimed invention will have utility and a rejection under 35 U.S.C. § 101 is improper. That is true 
regardless of how the claimed invention ultimately is used and whether the members of the class 
possess one utility or many. See Brenner v. Manson, 383 U.S. 519, 532 (1966); Application of 
Kirk, 376 F.2d 936, 943 (CCPA 1967). 

Membership in a "general" class is insufficient to demonstrate utility only if the class contains a 
substantial number of useless members. There would be, in that case, a substantial likelihood that the 
claimed invention is one of the useless members of the class. In the few cases in which class 
membership did not prove utility by substantial likelihood, the classes did in fact include predominately 
useless members. E.g., Brenner (man-made steroids); Kirk (same); Nafta (man-made polyethylene 
polymers). 1 

The Examiner addresses IMP as if the general class in which it is included is not the integral 
membrane protein family, but rather all polypeptides, including the vast majority of useless theoretical 
molecules not occurring in nature, and thus not pre-selected by nature to be useful. While these 
"general classes" may contain a substantial number of useless members, the integral membrane protein 
family does not. The integral membrane protein family is sufficiently specific to rule out any reasonable 
possibility that IMP would not also be useful like the other members of the family. 

Because the Examiner has not presented any evidence that the integral membrane protein class 
of proteins has any, let alone a substantial number, of useless members, the Examiner must conclude 
that there is a "substantial likelihood" that the IMP encoded by the claimed polypeptide is useful. 

Even if the Examiner's "common utility" criterion were correct - and it is not - the integral 
membrane protein family would meet it. It is undisputed that known members of the integral membrane 
protein family regulate membrane conductance and regulate ion channel activity. A person of ordinary 
skill in the art need not know any more about how the claimed invention regulates membrane 



1 At a recent Biotechnology Customer Partnership Meeting, PTO Senior Examiner James 
Martinell described an analytical framework roughly consistent with this analysis. He stated that when 
an Applicants' claimed protein "is a member of a family of proteins that already are known based upon 
sequence homology," that can be an effective assertion of utility. 
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conductance to use it, and the Examiner presents no evidence to the contrary. Instead, the Examiner 
makes the conclusory observation that a person of ordinary skill in the art would need to know 
whether, for example, any given integral membrane protein regulates membrane conductance. The 
Examiner then goes on to assume that the only use for IMP absent knowledge as to how this member 
of the integral membrane protein family actually works is further study of IMP itself. 

Not so. As demonstrated by Applicants, knowledge that IMP is an integral membrane protein 
is more than sufficient to make it useful for the diagnosis and treatment of various cancers. Indeed, 
IMP has been shown to be expressed in prostate, breast, and pancreatic tumor tissue libraries. The 
Examiner must accept these facts to be true unless the Examiner can provide evidence or sound 
scientific reasoning to the contraiy. But the Examiner has not done so. 

C. The uses of IMP in toxicology testing, drug discovery, and disease diagnosis 
are practical uses beyond mere study of the invention itself 

To the extent that the Examiner rejected the claims at issue on the ground that the use of an 
invention as a tool for research is not a "substantial" use, the Examiner's rejection assumes a substantial 
overstatement of the law, and is incorrect in fact. Therefore, it must be overturned. 

There is no authority for the proposition that use as a tool for research is not a substantial utility. 

Indeed, the Patent Office itself has recognized that just because an invention is used in a research setting 

does not mean that it lacks utility (Section 2107.01 of the Manual of Patent Examining Procedure, 8 th 

Edition, August 2001, under the heading I. Specific and Substantial Requirements, Research Tools): 

Many research tools such as gas chromatographs, screening assays, and nucleotide 
sequencing techniques have a clear, specific and unquestionable utility (e.g., they are 
useful in analyzing compounds). An assessment that focuses on whether an invention is 
useful only in a research setting thus does not address whether the specific invention is 
in fact "useful" in a patent sense. Instead, Office personnel must distinguish between 
inventions that have a specifically identified substantial utility and inventions whose 
asserted utility requires further research to identify or reasonably confirm. 
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The PTO's actual practice has been, at least until the present, consistent with that approach. It has 
routinely issued patents for inventions whose only use is to facilitate research, such as DNA ligases, 
acknowledged by the PTO's Training Materials to be useful. 

The subset of research uses that are not "substantial" utilities is limited. It consists only of those 
uses in which the claimed invention is to be an object of further study, thus merely inviting further 
research on the invention itself. This follows from Brenner, in which the U.S. Supreme Court held that 
a process for making a compound does not confer a substantial benefit where the only known use of 
the compound was to be the object of further research to determine its use. Id. at 535. Similarly, in 
Kirk, the Court held that a compound would not confer substantial benefit on the public merely 
because it might be used to synthesize some other, unknown compound that would confer substantial 
benefit. Kirk, 376 F.2d at 940, 945. ("What Applicants are really saying to those in the art is take 
these steroids, experiment, and find what use they do have as medicines.") Nowhere do those cases 
state or imply, however, that a material cannot be patentable if it has some other, additional beneficial 
use in research. 

Such beneficial uses beyond studying the claimed invention itself have been demonstrated, in 
particular those described in the Furness Declaration. The Furness Declaration demonstrates that the 
claimed invention is a tool, rather than an object, of research, and it demonstrates exactly how that tool 
is used. Without the claimed invention, it would be more difficult to generate information regarding the 
properties of tissues, cells, drug candidates and toxins apart from additional information about the 
polypeptide itself. 

D. The Patent Examiner Failed to Demonstrate That a Person of Ordinary Skill in 
the Art Would Reasonably Doubt the Utility of the Claimed Invention 

Based principally on citations to scientific literature identifying some of the difficulties involved in 

predicting protein function, the Examiner rejected the pending claims on the ground that the applicant 

cannot impute utility to the claimed invention based on its structural similarity to another polypeptide 

undisputed by the Examiner to be useful. The Examiner's rejection is both incorrect as a matter of fact 

and as a matter of procedural law. 
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As demonstrated in § H.A., supra, the literature cited by the Examiner is not inconsistent with 
the Applicants' proof of homology by a reasonable probability. It may show that Applicants cannot 
prove function by homology with certainty, but Applicants need not meet such a rigorous standard of 
proof. Under the applicable law, once the applicant demonstrates a prima facie case of homology, the 
Examiner must accept the assertion of utility to be true unless the Examiner comes forward with 
evidence showing a person of ordinary skill would doubt the asserted utility could be achieved by a 
reasonable probability. See In re Brana, 51 F.3d at 1566; In re Langer, 503 F.2d 1380, 1391-92, 
183 USPQ 288 (CCPA 1974). The Examiner has not made such a showing and, as such, the 
Examiner's rejection should be overturned. 

In the present case, the Examiner contended that the degree of amino acid identity among IMP 
and other integral membrane family proteins is insufficient to establish that IMP is a member of the 
integral membrane family of proteins and thus shares the same utilities. The Examiner attempted to 
support this assertion with the teachings of Bowie et al. (Science (1990) 247:1306-1310), and Burgess 
et al. (J. Cell Biol. (1990) 111:2129-2138), all of record and addressed below. However, all of these 
references fail to support the outstanding rejections. 

Applicants submit that the teachings of Bowie et al. are, in part, counter to the outstanding 
rejections, and in part, supportive of the asserted utilities of IMP based on amino acid sequence 
homology to integral membrane family proteins. Careful review of this reference reveals that the 
teachings of Bowie et al. are directed primarily toward studying the effects of site-directed substitution 
of amino acid residues in certain proteins in order to determine the relative importance of these residues 
to protein structure and function. As discussed below in further detail, such experiments are not 
relevant to Applicants' use of amino acid sequence homology to reasonably predict protein function. 

In support of Applicants' use of amino acid sequence homology to reasonably predict the utility 
of the claimed polypeptide, Bowie et al. teach that evaluating sets of related sequences, which are 
members of the same gene family, is an accepted method of identifying functionally important residues 
that have been conserved over the course of evolution. (Bowie et al., page 1306, 1 st column, last 
paragraph, and 2 nd column, 2 nd full paragraph; page 1308, 1 st column, last paragraph; page 1310, 1 st 
column, last paragraph.) It is known in the art that natural selection acts to conserve protein function. 
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As the Examiner stated and as taught by Bowie et ah, proteins are tolerant of numerous amino acid 
substitutions that maintain protein function, and it is natural selection that permits these substitutions to 
occur. Conversely, mutations that reduce or abolish protein function are eliminated by natural selection. 
Based on these central tenets of molecular evolution, Applicants submit that the amino acid differences 
among Applicants' claimed polypeptide and known ion channel regulators are likely to occur at 
positions of minimal functional importance, while residues that are conserved are likely those that are 
important for protein function. One of ordinary skill in the art would further conclude that the level of 
conservation observed between Applicants' claimed polypeptide and ion channel regulators is 
indicative of a common function, and hence, common utility, among these proteins. 

The Examiner further cited Burgess et al. as demonstrating the "sensitivity of proteins to 
alterations of even a single amino acid...". (Office Action at p. 5) However, these references are not 
relevant to the case at hand. Burgess et al. describe mutagenesis of HBGF-1 at an amino acid residue 
known to be important for ligand binding. In this cases, particular amino acid residues with known 
importance to protein function were specifically targeted for site-directed mutagenesis. These mutations 
were "artificially" created in the laboratory and, therefore, are not analogous to molecular evolution, 
which is profoundly influenced by natural selection. For example, the deactivating mutations as 
described by Burgess et al. would almost certainly not be tolerated in nature. Furthermore, it is clear 
that over the course of evolution, amino acid residues that are critical for protein function are 
conserved. Thus, the amino acid differences between SEQ ID NO:l and stomatin and stomatin-like 
proteins are likely to represent substitutions that do not alter protein function. Therefore, the teachings 
of Burgess et al. are not relevant to the case at hand. 

One could then argue that partial loss-of-function mutations do occur in nature, for example, the 
mutation in hemoglobin that causes sickle cell anemia. However, this example is the rare exception in 
evolution, not the rule. Persistence of such a mutation in a population would not be expected by one 
of ordinary skill in the art. Persistence occurs only because of the fluke of heterozygous advantage. 
Therefore, the Examiner's assertion that one of skill in the art would routinely expect to find single 
amino acid substitutions that drastically affect the function of the individual members of a conserved 
protein family is entirely unsubstantiated. 
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IV. By Requiring the Patent Applicant to Assert a Particular or Unique Utility, the Patent 
Examination Utility Guidelines and Training Materials Applied by the Patent 
Examiner Misstate the Law 

There is an additional, independent reason to overturn the rejections: to the extent the 
rejections are based on Revised Interim Utility Examination Guidelines (64 FR 71427, December 21, 
1999), the final Utility Examination Guidelines (66 FR 1092, January 5, 2001) and/or the Revised 
Interim Utility Guidelines Training Materials (USPTO Website www.uspto.gov, March 1, 2000), the 
Guidelines and Training Materials are themselves inconsistent with the law. 

The Training Materials, which direct the Examiners regarding how to apply the Utility 

Guidelines, address the issue of specificity with reference to two kinds of asserted utilities: "specific" 

utilities, which meet the statutory requirements, and "general" utilities, which do not. The Training 

Materials define a "specific utility" as follows: 

A [specific utility] is specific to the subject matter claimed. This contrasts to general 
utility that would be applicable to the broad class of invention. For example, a claim to 
a polynucleotide whose use is disclosed simply as "gene probe" or "chromosome 
marker" would not be considered to be specific in the absence of a disclosure of a 
specific DNA target. Similarly, a general statement of diagnostic utility, such as 
diagnosing an unspecified disease, would ordinarily be insufficient absent a disclosure of 
what condition can be diagnosed. 

The Training Materials distinguish between "specific" and "general" utilities by assessing 
whether the asserted utility is sufficiently "particular," i.e., unique (Training Materials at p.52) as 
compared to the "broad class of invention." (In this regard, the Training Materials appear to parallel 
the view set forth in Stephen G. Kunin, Written Description Guidelines and Utility Guidelines , 82 
J.P.T.O.S. 77, 97 (Feb. 2000) ("With regard to the issue of specific utility the question to ask is 
whether or not a utility set forth in the specification is particular to the claimed invention.").) 

Such "unique" or "particular" utilities never have been required by the law. To meet the utility 
requirement, the invention need only be "practically useful," Natta, 480 F.2d 1 at 1397, and confer a 
"specific benefit" on the public. Brenner, 383 U.S. at 534. Thus incredible "throwaway" utilities, such 
as trying to "patent a transgenic mouse by saying it makes great snake food," do not meet this standard. 
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Karen Hall, Genomic Warfare , The American Lawyer 68 (June 2000) (quoting John Doll, Chief of the 
Biotech Section of USPTO). 

This does not preclude, however, a general utility, contrary to the statement in the Training 
Materials where "specific utility" is defined (page 5). Practical real-world uses are not limited to uses 
that are unique to an invention. The law requires that the practical utility be "definite," not particular. 
Montedison, 664 F.2d at 375. Applicants are not aware of any court that has rejected an assertion of 
utility on the grounds that it is not "particular" or "unique" to the specific invention. Where courts have 
found utility to be too "general," it has been in those cases in which the asserted utility in the patent 
disclosure was not a practical use that conferred a specific benefit. That is, a person of ordinary skill in 
the art would have been left to guess as to how to benefit at all from the invention. In Kirk, for 
example, the CCPA held the assertion that a man-made steroid had "useful biological activity" was 
insufficient where there was no information in the specification as to how that biological activity could be 
practically used. Kirk, 376 F.2d at 941. 

The fact that an invention can have a particular use does not provide a basis for requiring a 
particular use. See Brana, supra (disclosure describing a claimed antitumor compound as being 
homologous to an antitumor compound having activity against a "particular" type of cancer was 
determined to satisfy the specificity requirement). "Particularity" is not and never has been the sine qua 
non of utility; it is, at most, one of many factors to be considered. 

As described supra, broad classes of inventions can satisfy the utility requirement so long as a 
person of ordinary skill in the art would understand how to achieve a practical benefit from knowledge 
of the class. Only classes that encompass a significant portion of nonuseful members would fail to meet 
the utility requirement. Supra § m.B. {Montedison, 664 F.2d at 374-75). 

The Training Materials fail to distinguish between broad classes that convey information of 
practical utility and those that do not, lumping all of them into the latter, unpatentable category of 
"general" utilities. As a result, the Training Materials paint with too broad a brush. Rigorously applied, 
they would render unpatentable whole categories of inventions heretofore considered to be patentable, 
and that have indisputably benefitted the public, including the claimed invention. See supra § m.B. 
Thus the Training Materials cannot be applied consistently with the law. 



107519 



25 



09/898,216 



Docket No.: PF-0181-2 CON 

V. Summary of Arguments Regarding Utility Rejection 



Applicants respectfully submit that rejections for lack of utility based, inter alia, on an 
allegation of lack of specificity as set forth in the Office Action and as justified in the Revised Interim 
and final Utility Guidelines and Training Materials, are not supported in the law. Neither are they 
scientifically correct, nor supported by any evidence or sound scientific reasoning. These rejections are 
alleged to be founded on facts in court cases such as Brenner and Kirk, yet those facts are clearly 
distinguishable from the facts of the instant application, and indeed most if not all nucleotide and protein 
sequence applications. Nevertheless, the PTO is attempting to mold the facts and holdings of these 



sequences, as well as to claims to methods of detecting said polynucleotide sequences, where biological 
activity information has not been proven by laboratory experimentation, and they have done so by 
ignoring perfectly acceptable utilities fully disclosed in the specifications as well as well-established 
utilities known to those of skill in the art. As is disclosed in the specification, and even more clearly, as 
one of ordinary skill in the art would understand, the claimed invention has well-established, specific, 
substantial and credible utilities. The rejections are, therefore, improper and should be withdrawn. 

Moreover, to the extent the above rejections were based on the Revised Interim and final 
Examination Guidelines and Training Materials, those portions of the Guidelines and Training Materials 
that form the basis for the rejections should be determined to be inconsistent with the law. 

Written description rejections under 35 U.S.C. § 112, first paragraph 

Claims 1, 2, 17, and 18 have been rejected under the first paragraph of 35 U.S.C. 1 12 for 
alleged lack of an adequate written description. This rejection is respectfully traversed. 

The requirements necessary to fulfill the written description requirement of 35 U.S.C. 112, first 
paragraph, are well established by case law. 



2 "The concept of patentable subject matter under §101 is not 'like a nose of wax which may be 
turned and twisted in any direction * * *.' White v. Dunbar, 119 U.S. 47, 51." (Parker v. Flook, 
198 USPQ 193 (US SupCt 1978)) 




nose of wax," 2 to target rejections of claims to polypeptide and polynucleotide 
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... the applicant must also convey with reasonable clarity to those skilled in 
the art that, as of the filing date sought, he or she was in possession of the invention. 
The invention is, for purposes of the "written description" inquiry, whatever is now 
claimed. Vas-Cath, Inc. v. Mahurkar, 19 USPQ2d 1111, 1117 (Fed. Cir. 1991) 

Attention is also drawn to the Patent and Trademark Office's own "Guidelines for Examination 
of Patent Applications Under the 35 U.S.C. Sec. 112, para. 1", published January 5, 2001, which 
provide that : 

An applicant may also show that an invention is complete by disclosure of sufficiently 
detailed, relevant identifying characteristics 42 which provide evidence that applicant was 
in possession of the claimed invention, 43 i.e., complete or partial structure, other physical 
and/or chemical properties, functional characteristics when coupled with a known or 
disclosed correlation between function and structure, or some combination of such 
characteristics. 44 What is conventional or well known to one of ordinary skill in the art 
need not be disclosed in detail 45 If a skilled artisan would have understood the inventor 
to be in possession of the claimed invention at the time of filing, even if every nuance of 
the claims is not explicitly described in the specification, then the adequate description 
requirement is met 46 [footnotes omitted] 

Thus, the written description standard is fulfilled by both what is specifically disclosed and what 
is conventional or well known to one skilled in the art. 

SEQ ID NO:l is specifically disclosed in the application (see, for example, pages 14-15, lines 
7-11). Variants of SEQ ID NO:l are described, for example, at page 15, lines 17-20. In particular, 
the preferred, more preferred, and most preferred IMP variants (80%, 90%, and 95% amino acid 
sequence similarity to SEQ ID NO:l) are described, for example, at page 15, lines 17-20. Incyte 
clones in which the nucleic acids encoding the human IMP were first identified and libraries from which 
those clones were isolated are described, for example, at page 14, line 1 of the Specification. 
Chemical and structural features of SEQ ID NO:l are described, for example, on page 14, lines 7-18. 
SEQ ID NO:l, one of ordinary skill in the art would recognize naturally-occurring variants of SEQ ID 
NO:l having 90% sequence identity to SEQ ID NO:l. Fragments of IMP could be made using either 
recombinant methods (e.g., see pages 19-24) or by chemical synthesis (e.g., see page 17, lines 15-22, 
and page 24, lines 13-19). "Biologically-active" fragments of IMP are defined at page 8, line 16-17. 
Methods for determining biological activity of IMP and fragments thereof are provided, e.g., at page 
51, line 22-26. The Specification at pages 14-15 also discusses the biological activity of protein 
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homologs of IMP. Accordingly, the Specification provides an adequate written description of the 
recited polypeptide sequences. 

A. The Specification provides an adequate written description of the claimed 'Variants" of ^ 
SEQ ID NO:l. 

The Office Action has further asserted that the claims are not supported by an adequate written 
description because 

"[t]he written description in this case only sets forth SEQ ID NO:l and therefore the 
written description is not commensurate in scope with the claims which read on variants 
which as claimed, include naturally occurring polypeptides that are at least 90% identical 
to SEQ ID NO:l, and biologically active and immunogenic fragments of SEQ ID 
NO:l." Office Action at Page 8, lines 5-9. 

Such a position is believed to present a misapplication of the law. 

1. The present claims specifically define the claimed genus through the recitation 
of chemical structure 

Court cases in which "DNA claims" have been at issue (which are hence relevant to claims to 

proteins encoded by the DNA and antibodies which specifically bind to the proteins) commonly 

emphasize that the recitation of structural features or chemical or physical properties are important 

factors to consider in a written description analysis of such claims. For example, in Fiers v. Revel, 25 

USPQ2d 1601, 1606 (Fed. Cir. 1993), the court stated that: 

If a conception of a DNA requires a precise definition, such as by structure, formula, 
chemical name or physical properties, as we have held, then a description also requires 
that degree of specificity. 

In a number of instances in which claims to DNA have been found invalid, the courts have 

noted that the claims attempted to define the claimed DNA in terms of functional characteristics without 

any reference to structural features. As set forth by the court in University of California v. Eli Lilly 

and Co., 43 USPQ2d 1398, 1406 (Fed. Cir. 1997): 

In claims to genetic material, however, a generic statement such as "vertebrate insulin 
cDNA" or "mammalian insulin cDNA," without more, is not an adequate written 
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description of the genus because it does not distinguish the claimed genus from others, 
except by function. 

Thus, the mere recitation of functional characteristics of a DNA, without the definition of 
structural features, has been a common basis by which courts have found invalid claims to DNA. For 
example, in Lilly, 43 USPQ2d at 1407, the court found invalid for violation of the written description 
requirement the following claim of U.S. Patent No. 4,652,525: 

1. A recombinant plasmid replicable in procaryotic host containing within its nucleotide 
sequence a subsequence having the structure of the reverse transcript of an mRNA of a 
vertebrate, which mRNA encodes insulin. 

In Fiers, 25 USPQ2d at 1603, the parties were in an interference involving the following count: 

A DNA which consists essentially of a DNA which codes for a human fibroblast 
interferon-beta polypeptide. 

Party Revel in the Fiers case argued that its foreign priority application contained an adequate 
written description of the DNA of the count because that application mentioned a potential method for 
isolating the DNA. The Revel priority application, however, did not have a description of any particular 
DNA structure corresponding to the DNA of the count. The court therefore found that the Revel 
priority application lacked an adequate written description of the subject matter of the count. 

Thus, in Lilly and Fiers, nucleic acids were defined on the basis of functional characteristics 
and were found not to comply with the written description requirement of 35 U.S.C. §112; i.e., "an 
mRNA of a vertebrate, which mRNA encodes insulin" in Lilly, and "DNA which codes for a human 
fibroblast interferon-beta polypeptide" in Fiers. In contrast to the situation in Lilly and Fiers, the 
claims at issue in the present application define polypeptides in terms of chemical structure, rather than 
on functional characteristics. For example, the "variant language" of independent claim 1 recites 
chemical structure to define the claimed genus: 

2. An isolated and purified polynucleotide sequence encoding a polypeptide selected 
from the group consisting of:...b) a naturally-occurring amino acid sequence having at 
least 90% sequence identity to the sequence of SEQ ID NO:l... 
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From the above it should be apparent that the claims of the subject application are 
fundamentally different from those found invalid in Lilly and Fiers. The subject matter of the present 
claims is defined in terms of the chemical structure of SEQ ID NO:l. In the present case, there is no 
reliance merely on a description of functional characteristics of the polypeptides recited by the claims. 
In fact, there is no recitation of functional characteristics. Moreover, if such functional recitations were 
included, it would add to the structural characterization of the recited polypeptides. The polypeptides 
defined in the claims of the present application recite structural features, and cases such as Lilly and 
Fiers stress that the recitation of structure is an important factor to consider in a written description 
analysis of claims of this type. By failing to base its written description inquiry "on whatever is now 
claimed," the Office Action failed to provide an appropriate analysis of the present claims and how they 
differ from those found not to satisfy the written description requirement in Lilly and Fiers. 

2. The state of the art at the time of the present invention is further advanced 
than at the time of the Lilly and Fiers applications 

In the Lilly case, claims of U.S. Patent No. 4,652,525 were found invalid for failing to comply 
with the written description requirement of 35 U.S.C. §112. The '525 patent claimed the benefit of 
priority of two applications, Application Serial No. 801,343 filed May 27, 1977, and Application Serial 
No. 805,023 filed June 9, 1977. In the Fiers case, party Revel claimed the benefit of priority of an 
Israeli application filed on November 21, 1979. Thus, the written description inquiry in those case was 
based on the state of the art at essentially at the "dark ages" of recombinant DNA technology. 

The present application has a priority date of January 9, 1997. Much has happened in the 
development of recombinant DNA technology in the 17 or more years from the time of filing of the 
applications involved in Lilly and Fiers and the present application. For example, the technique of 
polymerase chain reaction (PCR) was invented. Highly efficient cloning and DNA sequencing 
technology has been developed. Large databases of protein and nucleotide sequences have been 
compiled. Much of the raw material of the human and other genomes has been sequenced. With these 
remarkable advances one of skill in the art would recognize that, given the sequence information of 
SEQ ID NO:l, and the additional extensive detail provided by the subject application, the present 
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inventors were in possession of the claimed polynucleotide variants at the time of filing of this 
application. 

3. Summary 

The Office Action failed to base its written description inquiry "on whatever is now claimed." 
Consequently, the Action did not provide an appropriate analysis of the present claims and how they 
differ from those found not to satisfy the written description requirement in cases such as Lilly and 
Fiers. In particular, the claims of the subject application are fundamentally different from those found 
invalid in Lilly and Fiers, The subject matter of the present claims is defined in terms of the chemical 
structure of SEQ ID NO:l. The courts have stressed that structural features are important factors to 
consider in a written description analysis of claims to nucleic acids and proteins. In addition, the genus 
of polypeptides defined by the present claims is adequately described, as evidenced by Brenner et al. 
and consideration of the claims of the '740 patent involved in Lilly. Furthermore, there have been 
remarkable advances in the state of the art since the Lilly and Fiers cases, and these advances were 
given no consideration whatsoever in the position set forth by the Office Action. 

Prior Art Rejections - 35 U.S.C. $ 102 

I. Rejection of Claims 1 and 17 Under 35 U.S.C § 102(b) 

Claims 1 and 17 have been rejected under 35 U.S.C. § 102(b) as being anticipated by 
Wakefield et al. (U.S. Patent No. 5,534,619). To support this rejection, the Examiner has provided a 
sequence alignment showing that a 7 amino acid stretch of a sequence disclosed in Wakefield et al. is 
identical to a 7 amino acid stretch of the sequence set forth in Applicants' SEQ ID NO: 1. Claims 1 
and 17 are drawn, in part, to a polypeptide comprising an immunogenic fragment of a polypeptide 
having an amino acid sequence of SEQ ID NO: 1, and a composition comprising the composition of 
claim 1 and a pharmaceutically acceptable excipient, respectively. 

Wakefield et al. is directed to heparin anticoagulation proteins and polynucleotides encoding 
those proteins. There is no apparent relationship between the function of the Wakefield proteins and 
the integral membrane protein of the present application. Nevertheless, the Office Action has made the 
unsubstantiated assertion that the protein fragment disclosed in Wakefield proteins would be 
immunogenic. There is no evidence that the amino acid sequence of Wakefield which is homologous to 
a seven amino acid stretch of Applicants' polypeptide is immunogenic. 
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Regardless, in the interest of expediting prosecution of the subject application, claim 1 has been 
revised to recite, inter alia, an immunogenic fragment of a polypeptide having an amino acid sequence * 
of SEQ ED NO:l, said fragment comprising at least 15 contiguous amino acid residues. Wakefield et 
al. does not have any disclosure specifying such fragments or antibodies which would bind thereto. By 
this amendment, Applicants expressly do not disclaim equivalents of the invention which could include 
polynucleotides encoding immunogenic fragments comprising fewer than 15 contiguous amino acid 
residues of SEQ ID NO:l. Applicants do not concede to the Patent Office position; Applicants are 
amending the claims solely to obtain expeditious allowance of the instant application. While not 
conceding to the Patent Office position, it is believed that claim 1, as amended, and dependent claim 7, 
recite patentable subject matter. Therefore, withdrawal of this rejection is requested. 

II. Rejection of Claims 1 and 17 Under 35 U.S.C § 102(e) 

Claims 1 and 17 have also been rejected under 35 U.S.C. § 102(e) as being anticipated by 
Zhang et al. (U.S. Patent No. 5,670,483). To support this rejection, the Examiner has provided a 
sequence alignment showing that a 7 amino acid stretch of a sequence disclosed in Zhang et al. is 
identical to a 7 amino acid stretch of the sequence set forth in Applicants' SEQ ID NO: 1. The Office 
Action asserts that "[b]ecause any fragment is potentially immunogenic given the correct dosage and 
concentration, the protein fragment disclosed by Zhang et al [sic] in the absence of evidence to the 
contrary would be immunogenic" (Office Action at p. 11). This rejection is traversed. 

Zhang et al. is directed to a macroscopic membrane formed by amphiphilic peptides. There is 
no apparent relationship between the function of the Zhang proteins and the integral membrane protein 
of the present application. Nevertheless, the Office Action has made the unsubstantiated assertion that 
the polypeptide sequence disclosed in Zhang would encode an immunogenic polypeptide. There is no 
evidence that the amino acid sequence of Zhang which is homologous to a seven amino acid stretch of 
Applicants' polypeptide is immunogenic. 

To expedite prosecution, claim 1 has been amended to recite, inter alia, an immunogenic 
fragment of a polypeptide having an amino acid sequence of SEQ ID NO:l, said fragment comprising 
at least 15 contiguous amino acid residues. Support for this amendment can be found in the 
specification at, for example, page 51, line 1. By this amendment, Applicants expressly do not disclaim 
equivalents of the invention which could include polynucleotides encoding immunogenic fragments 
comprising fewer than 15 contiguous amino acid residues of SEQ ID NO:l. Applicants do not 
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concede to the Patent Office position; Applicants are amending the claims solely to obtain expeditious 
allowance of the instant application. While not conceding to the Patent Office position, it is believed 
that claim 1, as amended, and dependent claim 7, recite patentable subject matter. Therefore, 
withdrawal of this rejection is requested. 



In light of the above amendments and remarks, Applicants submit that the present application is 
fully in condition for allowance, and request that the Examiner withdraw the outstanding rejections. 
Early notice to that effect is earnestly solicited. 

If the Examiner contemplates other action, or if a telephone conference would expedite 
allowance of the claims, Applicants invite the Examiner to contact Applicants' Attorney at 
(650) 855-0555. 

Applicants believe that no fee is due with this communication. However, if the USPTO 
determines that a fee is due, the Commissioner is hereby authorized to charge Deposit Account No. 



CONCLUSION 



09-0108. 



Respectfully submitted, 
INCYTE CORPORATION 






J-otTL. Kerber 
Reg. No. 41,113 

Direct Dial Telephone: (650) 845-4894 



3160 Porter Drive 
Palo Alto, California 94304 
Phone: (650) 855-0555 
Fax: (650) 849-8886 
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VERSION WITH MARKINGS TO SHOW CHANGES MADE 

IN THE CLAIMS: 

1. (Once amended) An isolated polypeptide comprising an amino acid sequence selected from the 
group consisting of: 

a) a polypeptide comprising an amino acid sequence of SEQ ID NO:l, 

b) a naturally occurring polypeptide comprising an amino acid sequence at least 90% 
identical to an amino acid sequence of SEQ ID NO:l, 

c) a biologically active fragment of a polypeptide having an amino acid sequence of SEQ 
ID NO: 1, and 

d) an immunogenic fragment of a polypeptide having an amino acid sequence of SEQ ID 
NO: 1 , said fragment comprising at least 15 contiguous amino acid residues . 
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identify all cDNA species, and the approach does not easily allow a WKImm 
screening. Analyse of *ene expression by the study of protein* present ,n a c?ll Z 
ii«ue presents a favorable ahemauve. This can be Sieved by use of tu-o-dimen^n" 
i--Di eel electrophoresis, quantitative computer ima.ee anilvs.s. and protein identif, 
cation techniques to create -reference maps' of all detectable proteins. Such reference 
maps establish patterns of normal and abnormal gene expression in the orcanism and 
allow the examination of some post-translational protein modifications which" are 
functionally ,mponant for many prote.ns. It » possible to screen protein* «v«- mali 
cally from reference maps to establish their identities. 

To define protein-based gene expression analysis, the concept of the -proieome" 
u a, recently proposed (Wilkins««/. 1995: Wasingerm,/.. 199?,. A prLom- 7' h , 
entire PROTein complement expressed by a genOME. or fcv a cell or tissue tvoe Thl 
concept of the proteome has some differences from that of the eenome a. while th Z 
is only one definitive genome of an organism, the proteome I an entitv which can 
change under different conditions, and can be dissimilar in different ti« U e< of a sin^ 
organism. A proteome nevertheless remains a direct product of a senome Inters 
ingly. the number of proteins in a proteome can exceed the number'of eenes oresrn,* 
as protein products expressed by alternative gene splicine or with different noc, 
translational modifications are observed as separate molecules on a -el As Z 
extrapolation of the concept of the genome project', a -proteome project' i* r-search 
which seeks to identify and characterise the proteins present in a cell or tissue unH 
define their patterns of expression. 

Proteome projects present challenges of a similar magnitude to that of o C „ome 
projects. Technically, the 2-D eel electrophoresis must be reproducible and of hi*h 
resolution, allowing the separation and detection of the thousands of proteins in a ceH 
Low copy number proteins should be detectable. There should be computer *el ima*c 
analysis systems that can qualitatively and quantitatively cataloc the eleetrophoreiicaHv 
separated proteins, to form reference maps. A range of rapid "and reliable techniouev 
must be available for the identification and characterisation of proteins As a conse * 
quence of a proteome project, protein databases must be assembled that contain 
reference information about proteins: such databases must be linked to "cnomic 
databases and protein reference maps. Databases should be widely accessible and easy 

Recently, there have been many changes in the technique, and resources available 
lor the analysis of proteomes. It ,s the aim of this chapter to discuss the stum, of ih. 
^outlined above, and to review briefly the progress of some current proteome 

Two-dimensional electrophoresis of proteomes 

Two dimensional ( 2-D ) eel electrophoresis involves the separation of protein, bv their 
|soelectr,c point in the first dimension, then separation according to molecular weicht 
by sodium dodecyl sulfate electrophoresis in the second dimension. Since firs 

cho^orlh ° SC - ,975: °7 ZnCl1 1 9?5: SChCC,e - 19151 " *™ < «* meSod of 
choice for the separation of complex mixtures of proteins, albeit with manv modifier 

t.ons to the original techniques. 2-D electrophoresis forms the basis of proteome 

projects through separating proteins by their size and charee (Hochstrasser *, al 
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992). Current 



protocols can resolv- , uo to three thousand proteins from 7~Z* C 
Miigls gel [Fifure J). proteins irom j complex sample on 



: d gel resolution and reproducibility 

Apnmary challenge of separating complex mixtures of protein, bv %D *ei 
Phoresis ha, been to achieve high resolution and reproTStv Hi^h rd f Clr °* 
ensures that a max.mum of protein species are separat^ ^ r ^J^ 
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vital 10 alio* comparison of sel* from day , 0 dav and he,..~ n r« * 
faaw can bfdiffisuh 10 achieve ' * r *~ u?h M,Sv Th - * 

sepaiaie pro.e.nsofh. ? herpl (7 10 IIJl.O'Farrell I97<- O F-n.n r ^ M 10 

OFarreH. ,977, LWon „„ acl> , lhe uw of cJer 1 '^,^ 

footing procedure is susceptible to cathode drift* wherebv nH «.r- a ' v0?,can * 

.> r , teBU?l or a mPh oK., r chansc «l; m ^ SS^;^ 

Carr.er umpnolyie pH gradients are also distorted bv hieh •. , ~" 

limitation is that iso electric foeusinj f els. »hich are caM and su hj«i lo eiec.™^ 
... m narrow f to, .ubes. „ eed , 0 „ „ lnlded b ...tata, m£ J s JS^SSSS; 
,0 the second dim.-ns.on - a procedure that potentially distort, .he 

~Hcd"^^^ 

,990, h H, ? n sen S „,v„y de.ec.ion , then J^t^-J.'o?^^^ 

pho>phonmaging plates (Bonner and Laskev 1974- i 0 h^, n „ D , orc ^ rj P n > or 
,990: Pa.terson and La„e, ,99.H H^EttSZSZZZ 
organi smi o: tissues that can be radiolahelled. " pr,,cl,cjn,e ,or 

An alternative technique, which is becoming the method of choice for the firs, 
d.mens.on separation of proteins, involves isoelectric focusm- ,n .mrnoh 1 , , u 
gradient , IPG , ge.s < Bjellqv.st « «/.. 1 982: Gore. Pastel and GuntheM988 R h P 
1990,. Immobilized pH gradients are formed V the co^^J^ 
gradient into an acrylamide matrix, creatine a Gradient thai u ™«. . V P 
ume. IPG S e,< are usua,,v poured on.o a WStTSttSSS 
s.ron f and prov.de, easy eel handling (Ostertrren. Eriksson and Bielto,i« 7« ™' 
major advances 0 f , PG 5cpanlllo „ 5 m ,„„ „, , uff „' fr „„, ' * 8 ' 

,hey al,osv focusne of basic and very ac.d.c prote.ns ,o tJZJZZjl^"* ' 
he prec.e.y .ailored ,„„ear. stepwise. „emoidal,. and L pa" "onf ven 
narrow pH ranee arc possible , Oft* pH uni.s per cm„Ri..„c„i i«»oT.M 7 
>9£ .993, S,nha , „,. ,99U: 09r S „ „,. P „ 88: 0.^.', 
I WW.. Ho»evcr. ,i no. curren.lv possible ... use IPC -el. ... scnara,, ,! L 
rroie.n. of ,oelec,r,c po.n, crea.er .ban ,0. a„hou S b ,„ ' ,, ,Xd ^mcm 
Nam™ pH ranee senaralions are useful lo addres, problem. of „, T 1Wtm - 
in complex sample,. a„o,,„ ? in - on r^Zl^T^Z 

^ are now commercially available, which begin to addresl the problem of m, • 
and inter-lab isoelectric focus.ng reproducibility. P™ium of hum- 

There are two means of electrophoresis for the seeonri .lim*. 
preen, venica, slab eels and horLu, ul.ra. n e C6 X Zc'". 

a.n lam.de «h,ch separa.e pro.e.ns in the molecular mass rMm of ill - I 

Mackmt ? el ,s no. usually used with slab sels. hu, is nccessan- ,s h-„ . I 

tel «,ups ,Cor ? . Pos,e. and Gun.her. 19.S,. cl^^^ S J^T a, ! , 

< or no difference in .he reproducibility of elecu^htrLi . ' 15 

.Corneu e, „L ,994a,. bu, «JLJS^L2S^J^ 
will provide S rea,er reproducibili.y for cL.ona, 




^•ssssiwaa^ R ,„ ? . h , sl „,„, 

Tnc second dimension uj* SDS-PAGE Aciual eel ^ ^" trjaicm of ,„ jo o unils 

p.»na nup Thr f.rs. d.mcnoon used a » 3 „.u nncc mn^^^^Z ^!^* 
Prn,e,n» were muahsed »,.h am.do hl K L Actual bio, >ur , h , m " :( ,^ ' h ' ,,UCd "• PN 

lai^/ PiP " a2 i nC d u aCr>,yl aSagCl cr05S,inkerand *e addition of thiosulfate in the 
ca alys ,yM Cm has been shown to e.vc better resolution and hieher .entity 
d fl.no. (Hochstrasser and Merril. 1988; Hoctanwer. Patchornik and Merrif 
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Notuuhstandms the advance described afcrve. there » un increase d-rianw , 
improve the reproducibility of 2-D electrophoresis » lactate database ^"In 
and proteom: stud.e.v Harrington e, «/. fJ993i explain that if a -el resoK-^™ 
protem spots, and there is 99.5* spot matching from eel to 8cL ih,* will produce \) 
.pot errors per gel. This amount of error, which might accumulate with each -j to ~l 
comparison used ,n database construction, could produce -n unacceptable d-^r-'of 
uncena.nty in eel databases. To address these issues, panic! automation of lar% 1 D 
gel oeparauon, has been undertaken (Nokihara. Morita and Kuriki 100-». u-.J 

m one study was found to be threefold improved over minimi mnhnHc . u " 
,./.. ,99,, „ should he no,ed ,ha, OTall P :.D ^ l^oT^^r^ 
almo« completely au,oma.=d (Brewer « „/.. 1 o 86l . a | lnolKn , hcst 1 ™" 

used for database studies. * ? u "- 



MICROPREPaRaTJVE I D GEL ELECTROPHORESIS 

With the advent of affordable protein microcharacieri.sation techniques, including N 
K™nalm,crosequene,ng.am^ "f^J 
analysts and monosaccharide compositional analysis, a new challeL fa?*? » eTecto 
pnores,, has teen to maintain high resolution and reproducibility but" »o provl* 
protein in sudden, quantities for chemical analysis thigh nanosram to low mi cro^ 
quantum of protein, per spot,. This becomes difficult to achieve with verv comn^ 
samples such as whole bacterial cells, as the initial protein load is divided amon" ^ 
.o 4000 protein species. Two approaches are used for producine amounts oT^e^ 
that can be cnem.cal.y characensed. The first method is ,o run multtp" c ° Is ™ 
and pool the spots of .merest, and subject them to concentration Uiaal 1994 Walsh 
„<,!.. 99,:Rasmussen, fl ,.. 1992,. In this approach, the concentration 
also act as a punf.canon step to remove accumulated electrophorctic contaminan 
,uch a, glvcine. A more elegant approach has been to exploit thehich loadi„ s ~ 
of IPG tsoelecmc focus.ng. The high loading capacity of immobilised P H Cd'el 
uas described early ,Ek. Bje.lqv.sl and R.ghetti. 19S31. but ha, on.v Sen 
applied ,o 2-D electrophone , Hanash c, aL , 99 1 : Bjcllqvist „,.. , 99^bHJr> to " 
. mg of protein can been appl.ed to a ,,ngle geL yielding microgram q uan,,,,e> of hun- 
dred, o, protem specie,. A further benef.t of this approach i/tha, p^otcin p rC s C n "n 
h a ' Undj " C % V : h,ch ™> 001 bf — by lower protem loads, arc niore Hkelv 
to be detected. The use of electrophore.ic or chroma. oeraphtc prefracuonaimn ~h 
niquestHochstrasser^/.. 199,a: H^n ? u>n , aL. m^^—^ 
of narrow -range IPC separations < Bjellqvis, ei „/.. J 99 lb i prov.des a KkHv 1 , 
Mud.es on protem, present in low abundance. n, P™'^aHe»> solut.on ,o 

Methods of protein detection 

dicLTJ hv T.!"" m <°' P™ tins from **• The mchod used will be 
dieted h> facors mciudine proiein load on gel .anulviical or premrauve, ,h! 
purpose of the „| , for pro,ein auanma.ion or for nloninf and che™ c 7cta " In " 
.on., and .he «n S mvi,y retired. The mo* common mean, of p™ „ i„ e *'on Z 
.he,r appl.canons are shown i„ TMt ,. Mo5 , deIKtjon £ 
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IfK) nc Sanchez rt <j;.. |uv; ; 
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example some glycopro.e.ns are not stained by coomass.e blue (GolJbcro „ „/ 
1 988 .. and many organic dyes are unsuitable for prote.n detecuon on PVDF Samples 

Although most means of protein detection give some indication of the quantities of 
protem present, in general they cannot be used for global quantitation. This is because 
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no proten. stain * able con.-.istemly to detect proteins over a u Mt ran*c of -on-n, 
.ions, isoelectric point< and amino acid compositions, and ui,h a v ar rl" of 
posi.trarslai.onal modifications ( GoIdbcrnr/«/.. I98S: Li <v,//.. 1989, Funh-rmor, 
there are large differences in staining pattern u hen identical eels or bio s are suh,e-t-d 
in different stains, including amido black, imidazole zinc, india irk nonc-au S 
colloidal gold, or coomassie blue tTovey. Ford and Baldo. I9R7- Ortiz n al 
i"he mo .. common means of quantising large number, of protein. ,n a " ; .D -I 
involves ,he rad.olabclling of pro«e.n samples prior ,o electrophoresis. and" r ro,e,n 
qi.anrta..on based on fluorogruphy and image analvsis or liquid scintillation Z m 
.Carrel, 1989: Cel.s and Olsen. 1994,. However, proteins w ncVd lo 7. ? 
,ne*,on.,e cannot he detected if only PS) methionine is Ld for^l, ^ 
acid analysis of protein spots visualised by other technique, presents a likelv meaT «r 
protein quantitation for the future. ' J " N pf 



BLOTTING OF PROTEIN'S TO MEMBRANES 



Elecropnoret.c blotting of proteins from two-dimensional poivacrviamide *cl* to 
membranes presents many options for protein identification and microcharaacrTsat.on 
which are not possible when proteins remain in gels. For example, when protems are 
blotted to poly vinylidene difluoride « PVDF » membranes. ,hev can be idcmifieJ bv N 
terminal sequencing, amino acid analysis, or immunoblottinc. or the v mav be subi-ctcd 
«o endoproteinase digestion, monosaccharide analysis, phosphate analvsis or direct 
matnx-a.s.s.ed laser desorption ionisation mass spectrometrv (Matsud.ira I9K7- 
i.kins^.. ,99 :Jungb.ut<v„/.. .994: Sutton „„/.. 1995: Rasmussenr ^ 
Ueizthandler a al.. 1993: Murthy and Iqbal. 1991: Eckerskorn a al I99V I, is' 
possible to combine of some of these procedures on a single protein spot on aV\'DF 
membrane .Packer a al 1 995: Wilkin* ct al.. submitted: VS ciz.hand.er ^ 
This ,s usefu when m.nimal amounts of protein are available for anaUs^' The* ' 
.echniques w,|| be explored ,n detail later ,n this review. No.w.thstandinn lhc .nov" 
■here are some disadvantages associated with blotting of proteins l0 membranes' 
There ,s a ways , 0 s, of sample during b.ott.ng procedures , Eckerskorn and L ^ , ' 
1993 . and common protein detection methods are less * n , ilive or nol :innllt .^ ; 



, " ' mjujyos are less sensime or not ann|ir-,hl 

membranes «7«Wr J ,. presenting difficulties for the ar,aK„s of len Tb ^L. 
protein, Deta.led d.scuss.on of ,he merits of availab.e membrane and o , u n 
blotting techniques can be found elsewhere < Eckerskorn and Lotupcich I W V S n a 
<•/ al.. I9«4; Patterson. 1994). P } " * lru P M 



:-D gel analysis, documentation, and proteome databases 

Following prote.n electrophoresis and detection, detailed analvsis of .el imaoes is 

undertaken with computer systems. For proteome pro,ects the aim of this „ . u 

jo catalogue a„ spots from the 2-D ge, in a qualitative and ^ ^ 

dcf ' nC ,he nun ' b " of P«e«» present and their levels of" « 
Reference gel nnages. constructed from one or more eels form the h-.sis „r , 
dimensional ge, databases. These databases also contain pro" 



Protreu „■„/, /, minwu . pmimy 

tatad ,o or ,m ;g r a!J d comprehend ^1^^*?***''* 

database*, containing DN'A s? quence data ehm,J»l , nd or . Can '*nV 
D „U and p,o, 5 ,n fjnciona, rf^^ESt"* " f «™ 
a< f enome and proreome pro.i.-cu p, ope « ,CS„ „ ^ 

Database cited in Garreh e: ul.. 1994). 5 ' » ea<l Pr r»e:n 



GEL IMAGE ANALYSIS AND REFERENCE GELS 

After 2-D electrophoresis and protein vittialicaiirM u. 

phosphorrmajinj. .ma*« of «b „ di, * """•j """'Ofaphv „, 

'«« den S „omer. o, char« -coup Si ^"'^ - 
Cefe <-,„/.. 1990a: U™,„ ™rf ^J,' CCD) COTCrj 'Cartel,,. I9S9; 

resolunonof 1 00 - 200 mm. and ca,^. w derateorT «* *'* » 

or more S rey ^ , Fetlowint .hi*. ™* lb " *■*» 1:56 
pub.ion, ,o remove venical and horuonul h ? °' nUn '- 

<po, pos,„on< and boundary, and „ ^ 10 *™> 

<po. rSSP, number. conuinrnt venical and ho£™T " f """ r - ,) - A «-ndard 
a«.sned ,o each de.eced spo, and becomeTL p™™" XT™* n' 0 '™™- * 
'»» -me no,ab,e S of,» arepac^s ^ £ S^:™"'^ ™* - 



Table 2: Some Software Package, f„ r the Analyse of Gel Imaces. 
<s\cm References" 



ELSIE - * 
CELL AB I * II 

MELAME I & IJ 

QL'EST I A; II jn j PDQL cST 

TVCHO A: KEPLAR 



OKcr.andMHIer I9K8; W,„ h W,„ h r/ fl/ 

" u. Lemon and Union ivwv i ~i %t * 

M.vr.cl r,„/. W " • " L:p "' n 

Appel.r,,,/ IWI.H.«fiMr a «cr r , fl / , Wfh 

C TSu:,r V - " '" ■ « Cel.. ,, .„ . 

^" fl ' ,VK - R '^-n.H.,rn;,ndAn Jt TM,n. IWJ 



As ihsre are difficuhie* in the electrophoresis of wmpl-, u -«h 1 ixx- r , • i 
re erence ? e. ima.ee, are often con.lc.ed from ma v 1, ' f h ! rC ^° doc ' hi '- 
•GarreKand Franza. 1989: Neidhard.««/ 1989) Sinr^h ' , SamC SUmp,e 
:000 to 4000 protems from one eel to ^7^, lh * ™<>*« ** "utch.ne of 
nnace ana.v.i, , v ,ems. Matchms of *ethl^^ Cha,, ^e «o 

manually desiena.es approximately 50 or ,n nrll ' Un °^ot. who 

>o be cross-ma'tched pS^Sh mil P ' POt ' a% on ccK 

«in, computer-ba'ed ^ J^J^JT Ur ° Und ,an ^ 

Close ,o ,b0* of .pL^^^ST 1 ,h ^ maichin ? over .he enure gel. 

iorc- i i - " w.sojop erai or intervention mav be reQuiredrni^« ..« imm. 

1988. Lemktn and Lester. 1989: Garrek. 1989: Myriek tnt^^ MMet ' 
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dcnM.omc.er. (BiCei .hum CAlOnfinal pi .majrc a. cap.urcd hv laser 
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iu::::* 
aiv r. 
obi;» 
w' ur\ i 
M\V 

wr\ 

anuni 
earn i 
po>iii 



SPOT » 

A ma 
scpur; 
to del 
poxitii 
of ma 
perfcu 
minut 
vcintil 
by rcl 
protci 
ct <i/. # 
Garrel 
he no 
Limit; 
not ac 
nly e 
an alii 
Myric 
1992- 
Wh 
and m 
protci 
their 
transf 
reguk 
*ynth. 
(Lath. 



CALCULATION Or PROTEIN ISCZlHCTRIi' POINT AND .\tOL5ClLAR U E | C HT 

Estimation of th; isoelectric point ipl: and molerular weichi (\I\V, 0 f prot-in f 
:-D cel. p:ov,d=> fundamental parameter* for each prolan. which air aKo ofusT 
during identification procedures . see f jlloumc action). The p| and M\V of protein* 
are recorded in 2-D eel.databa-es. Accurate estimations of protein pi and MW - 4n be* 
obtained by u vine 20 or more known pr jteins on a reference map ,o construe, standard 
curves of pi and molecular weight, which are then u>ed to calculate estimated P I and 
of ""known proteins (Neidhardt et aL 19S9: GarreK and Franza I9S9 Y-.n 
Bogelen. Huuon and Neidhardt. 19%: Anderson and Anderson. 1991 And« «„ . 

.o PYDF " ■ ' T- ,hf ° f Pro^n^o ucd 

to P\ DF can oe determined very accurately b> direct mass spectromc.n lEckcrskorn 

n «/.. 1992,. Where immobilised pH gradient, are used, the focusi,* pos,, t ,„ ( "f 

protems allow, their pi to be measured within 0.15 units of that calculated from tl , 

ammo acd sequence ( B.iellqvis, c , aL 1 993c ». It must be noted, however .hat nrrJ" 

carry.nc pos.-,ransl 3 nona! modifications may migrate to unexpected pi or \IW 

positions during electrophoresis (Packer et aL 1 995). 



SPOT QUANTITATION AND EXPRESSION ANALYSIS 

A major challenge faced in proieome projects u the quantitative analvs.v of proteins 
separated b> _-D electrophoresis. The most accurate means of protein quanmation 's 
to deiermme cnem,cally the amount of each protein present hv amino acid con, 
posmonal- analysis. However, the current method of choice for quantitative anahsis 
of many proteins ,s to radiolabel samples with ("S] methionine or "C amino acids 
perform the 2-D electrophoresis, and measure protein levels in disinflations per" 
minute .dprn. or un.ts of optical density. Quantitation is achieved either hv l.uuid 
•ont, iat.on counting or b> gel image analysis where spot densities are qua'nti, ucd 
n> reference io eel calibration str.ps containing known amounts of radiolabeled 
protein or agains, ,he miegra.ed opt.cal densiiy of all spots visualised « Wmdckerkho e 
jv a, 1990 Celis „ „/.. ,990b: Celis and Olsen. .994: Carrels. ,989 Utl n 
GarreN and Soher. 1993: Fey ,,„/.. .994,. All approaches cflect.ve.v allow ^ 0 
ne normal.sed aga.ns, ,he total d.wn.egra.ions per minutc loaded onto ,£ n C , 
. Ummu.ons that rema.n with rad.olabell.ng methods are that absolute quantitation ,s 
no. achieved because al) proteins have varying amounts of anv ammo acid and th,. 
only easily labelled samples can be investigated. Quantitative silver sta,n,„,' pre" em 
an alternative .G.ome.t, c, aL 1991 : Harrington rt aL 1992. Rodr.euez r, „/ ,w 
MyncK ,„ ,993). w hich when undenaken with (^thiourea .Wallace and Sal u/ 
ivy. j.hi is of extremely high sensitivi.v. 

When pro.em spots from samples prepa'red under different conditions are quum.tated 
and matched from gel to gel. ,. becomes possible to examine chan.es and patten n 
rro.e,n expression. Large sca.e invest.gat.on of u P - and down-rcgulation o'f p o" n ? 

u^sZTh" . dw ^ eme - bf und -^«- example. s.m.anMais 40 
an. formed human keratmocytes were shown to have 1 77 up-recula.ed and down- 
r. gula.ed proteins compared to normal keratinocy.es (Celis and Olsen 1 994 ."• detailed 
synthesis profiles of 1 200 pro.ems have been established in I to 4 cell mouse emhr^ « 
•Latham ,991. 1992,: and 4 proteins out of 197, were L^^ t ^ 
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cadmium to\icify in urinan proteinv cMyriCs et a!.. 199*i C^rrr.kx clori h 
in FDiem expre.s.on as a result of eene disrupuon* have also been ,nve«,^ S 
anj P Mo.. -Lar>en. Persona! communication,. Impressnely. hrce eel fe.s shou^ 
pro.em expression under different conditions can be slobalh mves„* Jled usin* 
^tical n ethod* that find groups of related objects u-jihin a la. For evampl- the 
REFo ra, c,ll line database. conMs„„ s of79«Is from 12 experiruej^ta 
each pel c ontains quantitative data for 1 600 crossmatched nroteir s h a ,K,.« , ! 
hy c.u„cr anah-M. barrels „ „/.. ,990,. This r^^^^TiT 
example, v ere .nduced or repressed similar* under sirmaYv ^ " i ' 
.ran.formu:,on. . 0 „e«in S a common mechamsm. Pro ^0^* ° U ^ C ^ * ru ; 
or repressed Jurmc culture growth to confluence were also found I, £l* ? "!! I 
potent.) for invesugation of cellular control mcch^?* ^ 
immense. I, is equally clear that investigations of gene express * of Z f> " 
curren.lv Really , mp0S sible usmg nucWe^ Led ** 
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' Arcane Protein 
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RaUocr epithelial daiahnsf 
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REr 52 rat ecll line -database 



5W!SS-:DPACE containing 
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Vim«i Pmiein Database fVPDi 
and Vcjvi Eleetrophnrciu- 
Protein Database (YEPD> 
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cm yrcmih conditions 

Identification m disease markers: 
i u i» separate databases have 
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Extensive identifications; 

uuanmativc spot measurements 

r»f iransiormed cells; idcnnfica- 

imn hi disease markers 

Quantitative v p<« 

inravurrmcniv through 

I m» - cell Mjtre 

Documents chances due to 

exposure io toni/in? radiation 

and m\u chemicals 

Detailed subcellular 
ira*iionation studie* 

Extensive Mudicv on repulaimn 
ol protein^ h> drupv and toxic 
a cents 

Accessible via World Wide W'ch 
uuantnaiise spot measurements 
under difierem conditions 

Accessible via World Wide W C b. 
complete!* tmecrated u ith 
SWISS. PROT and 
SWJSS3DIMACE 
Completed crossrclerenccd 
oreamvm database. YPD has 
extensive inlormation on over 
3500 proteins: VEPD has 
man> identifications 



Baker era!.. 1992 
Corbett ct «/.. |994b 
JunyNui r/«/.. I99J 
Cells aaL lwvt} a 
Cells a ai.. |s*y % 
Cells and Olsen IWU4 

Latham n ol.. |v*9l 
Latham <■/ tt I.. (sun 

Ci»n:.-tn. Tax lor and TolUscn. I9"2 



Win:i naf.. |wvi W m hr/„/. 

Anderson and Anderson. |uw|. 
Anderson vt #//.. 1^92. 
Ru-hjrds,in. Horn and Anderson, iwuj 
Carrels and Pran/a IVXV 
Bouu*ll r; ///.. | ygj 

Appcl rt a l„ 199 * 
Hocrisirjsscr a uL. 1^92 
Hupttcs r; ##/.. |99.^ 
Coij/ rf f( / 

Garrels n «/.. 199a 
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Table 4 

All thrr 
cxnas\.: 



Inli»nn;' 



Anmita 



Crms. 
Rcfcrr 
Daub. 



Other 



FEATURES OF PROTEOME DATABASES 



Proteome protects rely heavily on computer databases to moi - infnr~, . 
protetns jessed ,v an «„ .pro,eome i^^S^T^ 
mforroai.on of prottm> already characterised elsewhere, a, u dl a, protein data" from 
:-D sels such as apparent pi and MW. expression ieve, under dJff^SST 
subcellular locahsauon. anc" .nformation on noM.translai.onal modificat.on, lm w " 
of reference :-D eels. sScu-.n* protein SSP numbers and protcn .dentin -an™ 
s ould also beaded. Ideally, proteose databases should h. «^Wc uTh 
Mac.mosh or IBM persona computers and easy l0 use. Some proteose dataha^nS 
.he area, ,hey cover are K.ed ,n Tuhle J Databases range from col.ee .on, of 
annotated gels ,o large databases of .mages m.egra.ed with pro.e.n and nude^c d 
sequence banks. «whk juu 

One example of an .ntesrated proteome database is the su.te of SWISS prot 
SWISS-2DP.AGE and SWlSSoDlMAGE databases (AppeW wVi^J" 
1994: Appel. Ba.roch and Hoehstrasser. 1994; Bairoch and Bn^ X?£ 
feature* of these three databases are listed in Tuhle V. SWISS -PROT SWK< 
2DPAGE and SWISMDIMAGE are accessible through th WorU Wide "b 



Table 4: Tli- SWISS-PROT. SWISS-ZDPAGE and SNMSS^DIMACc 



suite i>l cn.v.hnkcd Llauh.w C v 



Information 



SWISS-PROT 

Text entries of sequence data; 
Citation information; 
taxonomic data. 38. 303 
entries in Release 2V 



-Niintiiaimnt 



Protein function. 
Posi iransi.intmal 
miKJifu'annns. ^ 
Domain*. 

Secondary structure. 
Quaternary structure. 
Di^easev actuated 
u ith proicin. 
Sequence conflicts 

swiss-:dpace 
sw1ss-3dimace 
embl. pir. pdb. 
omim. prosite. 

Medline. Flyhase; 

GCRDh. MaueDB. 

WonnPcp. Dict>DB 
Other Features Navigation to other 

SWISS databases achieved 
b> selecting r nine* w uh 
computer mouse 



Crow. 

Referenced 

Daianases 



SWISS2DPACE 

D eel imare* of* human 
liver, plasma. HepC2. HepG2 
secreted proteins, red blood cell, 
lymphoma, cerebrospinal lluid. 
macrophage like cell line, 
crvthroleukemu cell, platelet 
Gel imaeev where 
protein i* lound* 
Hou protein identified. 
Protein p! and MW. 
protein number; 
normal and paiholi 
variant* 



SWISS-PROT and all 
other databases 
accc***ihle through 
SWISS-PROT " 



Gel imaces shou pnwnon 
of identified proteins, or 
rcirion of ecl where protein 
should appear 



SWISS 3DIMAGE 

Collection of }\() \.d 
imace* oi proteins 



All .-inni>i;u,,tn (x 
:i\;nl.ible in SWISS- 
PROT 



SWISS- PROT and all 
"ihcr database* 
acce^ible through 
SWISS-PROT ' 



Mono and vtereo 
imaces available. 
Imaces can be 
transtcrrcd to local 
computer tmace 
viewing programs 
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(Bemtn-leceiaL 1992). albums anv eompuier con«.««i . . 
.he ,,ored mforma.ion and imase,. Nav*^^^ 

is >eamle,s. js all potential crosslink* are highlighted a- W ,hr « Phases 
car he Pieced with a computer mnu,e. From «h« ? dai*?^ W 
ah< u: a protein, including ammo acid sequence and t„n«.„ ~. " ,n '°"nation 
cat.ons. can be obtained.'^ precise pnST^^^'^^ modif " 

Fe can he viewed if known. and the 3-D Cure o ^1.'° °7 " 
mailable. References , 0 nucleic acid and ouTSTtei ?L!" ■ * ra if 
•■icreis to information stored elsewhere. aa " lha « 5 are u.so men to provid . 

Orcanism' databases, containing detailed nmi^in . 
ah,u, a spec.es. are becoming common and " nfo "™'- 

The.e differ from nucleic acid orprotein seq Jence d aiah ! P t A* PrP?ress 
PROT because thev are ,ma<>e L^^^-^^*^***"*"**'**- 
map position,, transcription^ *enes ^ Z^ ^ *" Chrom ^' 
c/irhr*/„ „,// sene-protein data^^^H u'™' 0 " Pa " e ™- ^ *- 
VanBoeeien and Nc.dhardt T *" NeMh3 « h - i95 * 

EC02DBASE »"onee^ 7c^^^ | ±. 19921 - k — - <ne 
information (including pi and MW estimate, ™h . . P , names " : ' D - ceI <P°> 
mation (GenBank or EMBL code, rt^S^*? ,d 7 lfica,iw * fnctic infor. 
. Kohara. Akivama. and hono J 98 7 | tl , ' 0CaU ° n °" Kohara 
rceulato, tnformafon ^^.^2 ^t^j" 
memner of region or sumulon.. All entries ,„ ,he ^DB^/^ t '"""^ 
referenced to the SWISS-PROT datah^ , B ..^ w , I iE are uKo cro ^- 

available information abou, a particular H^l^^^ ° f nohn ^ 1 
consistent manner ,n which organism database, are Jsen 1<1 "° 
comparisons in the future. avsemnied. which max hamper 

Identification and characterisation or proteins from 2-D o ds 

The number of proteins identified on a "'-D reform* „, i 
a — rch an, reference , 00l . A , ^^^Z^Z 'n * 
pro* ms .demined. a major™ ofcurr;n , me "° > J Pernor, of 

l.rnm :-D mapv ,n order u. U-fin- ih-n, J ' 0 Crc:n miln > P"*-™ 

da,ah.e, or'a* unk • ™™» *«> -nl Pn„c,„ 

.'«XM000prnieinsfiom«i».te™ *n ™ L n " Vc1 ' S ""' t "» be 

TradHionallv. proiem. from - D „l K k . m,n,mum of «•» ^"d cff.m. 

con„g,a,io n of .nknown pr0 , cjns w„ h l„ 0 ^ r0 " ^"J J^* 1 '""'*"*- 
homologous cenes of im-resi in fK^«ro.» , pro,c,nv or h > overexpress.on of 

expensive or timt and labour ^veT^^^ ^ « '<» 
h-arcnica, approach ,o n,a« pro, e ,n id.-ifiou.^ SylLSSt* 



Table r: Htcrarcmral auly«» for man snreninr of * r> 

Ra P .d and e , e: ;„ l0ue < ar . uvcd 3t a fifi J™ 1 /'"' ... mant^ 



Ordr: Jurnufirannn tr;*nmour 



Amino acio ana \m* 



Amino 3L id a'ul. w ui !n Vu-rmmal sequence ta; 
Pepnde-nu^ hrerrpnniinc 



Comhinannn of ammo acid ar.aKsn and pepnde 
ma»* fmerrpnnnng 

Ma" $pcL*:rom-ir> <equem;c tag 

Extensive N-icrawnal Ed nun miirriKeouencine 

Jmernai pepnde Edman mirriKcifucmrmf 

Micruv ;0urn ^n ? rn m»c <pectromeirx iclewrw 
*pra> icmiMimn. poM-snurce decrav MALDJ-TOF) 
Ladder <cuurn fc inp 



JunpMuif/,,;. S iuu ,uu; 
H«>hnh m . Hnuihaeu-anJ.Wf 
J«npWuir/ w /.. ^.XY lllt ^ ni ; ; ^. 
*ilkmt <-/«/.. suhmiucd 

Mann. Ho,rur and R,»e rMor!1 iuol 
^aie*r/„/. luy.v Mnn/ „ fwg ; 
Nuium r/ aL Ivy5 

Corducll f f «/,. I9V5. 
^"asin^cr rati. |vv< ; 
Mann and Wiim. ivvj 
Maiiuuira. iy*7 

Roscnfeld n aL \W2: 
Hcliman ei at., ivy.v 

it'hnson and U'jUh. 
Banlci-Joncv r; . |g yj 



use of rap.d and cheap identification tools such as ami ™ ^ ! , h,Mmo,vc ^ 

mass fingerprinting as firs, steps in protein iden ifi Ca ^„ "7" :md W"* 

*lou er. more expense and ,i me, consiL. M^fiST' ^'T^ * ' hC U * e of 

*-°ns,rucnononh,sh^^ 

of .he data created has been considered, as 

maenmr time P er sample, the analysis of data can hTn ^ rCqUire ,i,,,e 
con.ummg. Ammo acdanalvsi, and peptide ma s fi n " * * and ,in * 

jeehnique, in the hierarchy are di^m^rWo^r* to ^' ir '««™ 

.^ I fica,,on,echn iqU esm7*,. ce p atlm ::^^^ 
. PROTEIN IDENTIFICATION BV AMINO ACID COMPOSITION 

Tne ammo acid compo.iiion of „„,.-;„< ™™P os,l,on ^ of P'w.n. in cla.aha.es. 

mini. „, ^^J,£ ^T™„I "; en,branC - hl0 " cd I 1 ™** and 
^^^^^ 
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spa i rmr-siy. 



ASX: 

Sly: 

- : 
lit: 

p: 

Rank 5c 
1 



I 4 Thr 
1.3 >-T5: 
5.9 L«U: 



: .6 

5.0 
£.0 



Ser : 
Ala 
Val: 
Ph«: 



3.7 

€.7 
CO 
13.3 



Hi*: C.7 
*ro: 7.9 
C.3 



€ 59 Rtr.pe aaarchad: l €.64. 7 141 
lee:: Ran?t s carried: (13440. 20140* 



SWISS- ??." er.ir:es 



Pr::e:n 



»--r-?si£»cn: 



Dasrriprioa 



34 PYXI.ECOLI €.94 

39 4.32 

'0 ML7 A_EICLI 5 . OS 

<2 :at:.::::: £.52 

43 HZ.YI_E"L: 6 .56 



16999 WMtm CUttUCOTLTXARSruun 

36359 PANTOTHENATE KINASE (EC 2.7.1 33) 

J>7ij HOKOSERrnt 0-SCCriKYLTlUNS«XAST 

57E12 TRANSCRIPTIONAL ACTIVATOR 

19769 KEKQLYSXN C. PLASKID 



rinses: 



SV*ISS*?RCT entries tor ECC-I v* ^ 



Ranr. Srrre 



pi 



Mw 



Inscription 



1 
2 
3 
4 
5 



24 

::2 

112 
140 
142 



PTRI.ECOLI 

rrjB.Eroii 

YAKA SCCLI 



6.94 

6.73 
6.79 
6.83 
7.06 



16989 

17921 
19021 
14945 

14726 



ASPARTATE CAMLAMOYZ.T1UnSnERAn 
TRAJ PROTEIH. 

HYPOTHETICAL LIPOPROTEIN YWC 
HYPOTHETICAL 14.9 XD PROTEIN IN CRPE 
HYPOTHETICAL PROTEIN IN BETT 3 'REGION 

Ficurt 4. Computer printout Irom ExPASv tcrvcr uhcrr ihf ,™ 

ta. .n.r^cd d.Hcrcn.c hc.uccn ,hc f.rM anj <J,nd ran ",n.- pro, J.nV j^' 1 ^ ' 

i* Hie ...mvi .jL-nnn. ji„.n i W.ILins f , „/.. | y£s, " 1 ,nc '" r ranL '"? Pr»H-«n 

praph> -based anah ,v Proteins blotted 10 PVDF membranes can be hvdrolvsed in I h 
. at I i?-C ammo acid* extracted m a single brief step, and each .ample automatically 
dcnvaiised and separated by chromatography in under 40 minutes (Wilkms ,., ,//' 
I ¥95: Ou cr aL 1 995 .. In this manner, one operator can routinely analyse 1 00 proteins 
P=r week on one HPLC unit. This technology lends „self to automaton, and n I 
anticipated that instruments u j,h even greater sample throughput u ill be developed 

T m«? 'o T beCn PrCpared b> " mcT W™»' ^-D electrophoresis .Hanash 
«•/ «/ ; . I Wl: B,ellqv,s, „/.. ,993b,. blotted to a PVDF membrane and sta.ned h 
amido black, any visible protein spot ,s of sufficient quantity lor amino acid analysis 
•Cordu ell e, aL 1995: Wasinger ct aL 1995: Wilkins et aL I99<> 

After the amino acid composition of a protein has been determined, computer 
program, are used to match it again*, the calculated compositions of proteins in 
databases (Eckerskom a aL 1988: Sibbald. Sommerfeldt and Argos 1991 Whlu, 

M M 9 \ ShaW - ,993: H ° b0hm - H0Uth3CVC and Sand "' Wi,ki„s , 
199.-,,. Matching is usually done with only 15 or 16 amino acids, as cvsteine and 
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SEQU 

Whci 



Pr, V rru »,//,,„„„.„„. . /t ,. |</tv/ % 



IlSSSttlSXIttt 



Asx: 5.* CIx: 1C.8 StT: 4.1 Hi *- 

Zly: 11.2 Zhz: 2.B JUi: i;.9 Pro- 

T;?: €.( Arj: 3.7 V4l: 9.5 Met 

5.1 l*u: 6.2 Phe: 3.2 ty» 



4 . ' 
3.2 



Met: C.« 
4. J 



p: titir-.t. £.99 ft*n 9t searched: { 



Mw ti:iri:t; 45000 *ar: 9 e searched 



6.24) 
(36000. 54000) 



Rank 
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Prereia 


pi 


Mw 








6.03 


45318 
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32 




5. 86 


36502 
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S.7g 


45774 
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44 




5.16 


480X8 




45 


r>HT4_srcii 


5?B 


4858X 
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46 




5.79 


07€5 




46 




5.78 


37851 
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4"? 




5.98 


49X62 
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4 * 


^nyk.r::;: 


5.85 


43290 


:c 


5C 




6.01 


37064 



rioses; JV.TSS-PRrr entries fc* rr— - . 

UUi 

k * h $ r 

K R I K y 

K A I E q 
M N H $ L 
UNRA 
« S S K L 
K I S R ; 

Fipure 5. A P\*DF prntcm cpcu irnm an £ , »/, :.D rcicrcn -e m m «, • 

«n,e sample .no ,„ ™, «, 3nal> mv The N-.^,^ K R ^ 

PROT ,„r£ „,/, lhe a hm C lm of hes, mau-he> ^Xxd ^£dt * a ' M cmnc * '» 5 W 'SS- 
f..r .h.Kc cn.nei The „ T ranLmc «len,,r,,a,.* of < C n» l??™^™*'"™ SW,SS -™OT 
■l-ircc w..rc d.hercw,- hc.»cen .he fin, and second J^^m^^^T^ J ' ^ " *"* a 
•he ««« pm, c ,n 1 uen„fun..„n However. ,he sequence* tac i M L K R f r ""V?"" " ^ K '"- c 

m Fttutrc To dare, amino acid comno.,„ion hav Men uved m id-onr, " ' , 
lymphocy,.. and mo„« brain iCordwcl. „ „/.. my Wa™ , " ^^T" 

SE0™« F ' CAT '°''' BV AM,N ° AC ' D C0MroS ' T '» *«> N. TERMINAL 

When «npfc, from 2 -D {ek are no, unambiguously idcmiHcJ bv amino acid 
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c imposition, pi an<l MW. often th* corr— t id t 

^ <»•• Tak.nc advama- 0 f tfai oh <7 " / • W. 

•peciromcin -sequence laf conccp. : \lann and Wil'n "".'oo . * * U * d ,h? n,J " 
hned Edman dcrradat.on and am.no aeid I anal > ; ' 
- Wilkin. « <//.. submitted ». Th.> involve, „- v^L^T ' rTClitn ^ntif.cauon 
rro.cn. by Edman derrrada.ion for . or J C vV., !™ MrtJUC ' K ' ,n - c »^DF-Noiied 
u h.ch the same sample ,. used for ammo acid anah^A ^ " ^ ,n ? 

'.'moved from the r ro.c,n. its compoMtion is no. Jnir- / * M " m ° a « 
Mnceonh a. null amount of pro.eJ sequence ^ Fu "»«n,.ore. 

i Jman degradation cycle, can he used mSS^? MnUlteW " rcpcwm,? >^ 
aliou 3 octes ,o he completed in I h. therebv ^ ™1 L^™ should 
proteins per week on one au.oma.ed. muJu'cann^ "I / ° f ' W ° r Im ' r < 

,t.on. p, and MW of proteins are match d ZSg^^^T ™* «^ 
N-term,nal sequence, of be* mn,rh,„„ ~~SnT k T , * de * nhcd jh,n «■ ^ 
10 confirm .he pro.cn identity iFi v ,,rc «, ThkTllu W "" h ,he * Nec l u ™ce lac" 

pro,e,n. are N-term.nulK blocked", hut * a < ™£ '^T * * Usefu ' « ^ 
MJ-cen.,hie.o.he acetyl, formvl. or pvro-lutam'vl m « rr ' lc^n,! • n:l, am,n o acid, arc 
thi ; may elf provide useful information" ?" ™" 

of N-term.nal sequence tac and ammo acid comS" ? ,d?niin ™'™. A Mrcnsih 
data S cnera.ed are ouickhand easiinntertreT ^ ,dcniir «^-n tha, 

PROTEIN .DEVT.nCAT.ON B V PEPTIDE MASS FINGERPRINTING 
Technique, for the identifica.ion of proteins bv n^n.iH. 

recently heen described (Henzel c, d, 9 • fcJS £ ""^"'i"* have 
James ,,„,. I993: Mann . H() , ^ ™> Blcaxby. 1993: 

«/- Sutton „„/.. 1995). This involved the ^ ' ' ' ,99 *Mon7.« 

-nc rcsiduc-specnc enzvmes. the de^ 0 ^ " ,, ° n < ° f PCp, ' d ^ fr ™ ■ 
inf of these masses a?ain „ aeanZfiZE SET* ""^ "* "* ™" h - 
-queue database. As pmein% to« dWeT^ mo ^ ^"^^ frum P™™ 
should produce cW.er...k nncerpnnts d «hcir peptides 

The fir.t siepui pept.de mass r.ncerpr.nt.n" is nrnMin , i 
• matnx or bound ,„ PVDF can be\nn n 2a H Proh?in * w ,,h » 1 

d.?e,ts arc reponed .o produce n^ ^ ^ S ' aUh ^ hl " 
subsequen. pept.de mass aLv ^" ^T r'"^ ^ 
Mow « „/.. 1994, The cnzvme of choice r " ^^mussen « „/.. 1994: 

modified scquenc,n, ? rade..b U .o.heren 2 rn"s7Lvs'r a,rrtfm '- V lf >^ in lnf 

aKo been used tPapp.n. Ho.rup and ^^^X"^^ 1 ^^ 10 ^ 
pcpi.des obtained, it isd-sirahle fnrnr«.. , maximise the number of 

» * r «. ,Mon, ,, its:: j^,t ?; tdu " ij - d ^ ^ 

^ond, of ,I K pro, s ,n jre broken a „d Ji „ " surt - ,hal J " J>-uir.Jc 

«* » Ui Sr -„on. Surpr, „4" S\f " COnlu " rc "— mure 

nhmp h cnvNu lfcnvl , : .. n S ' '°™' C _^ ,i,Sp;,n,C acid 'P^ific. and 



(N.kodem and Fresco. 1979: Crimmim etui joon- v,„n 

After pr.e.ns are dipe„ed. pepude n^££^ ^ 
D.rec, anaK * ofpept-de m,x,ure> can he acme e S "** 
*nec I rom:tr > . r | il , m3d , AOrTMIOnma „ 11' ^ '° nix;nbn 

»* h.cher ^ensiiiv i,v anJ ereater tolerance in m„. P^'waNe breau.se of 

- James „ „/.. 199 v Mtti „ J ,o^^^ IB *»« ««**mce* from :. D cel. 
more. re«m modify ,otn ^^J^?" 1 ^ W 
diffu-ulne. exner,e»*u with the Caliban TvULD. ™ ^ ^° ,Ved ^ 
Vorm and Mann. J9 * : Vorr, *o^JS^^>£* " «'~ 
ma« «pec.romeiry alio, s a small fraction of a die™ o ! ?h of 

>or analyse, and analyse itself i.. compJete „ , ' pr ° ,e ' n «P« «o he used 

A major challense associated with n-mid<? m-.cc r . 
prior to computer matching aeain « ETd^S^ 
-us, He examined carefully ,o detcrm.ne wh.ch^r^eT ? di? " K ^ 
•merest. a . there are often en 7 vm- awodiBeuion^ P ' PCP,,de n! -~ of 
stance. pre.cn, .Henze. „ ^,993: Mont 77 IT*™' ^'"S «*■ 
Funhermore. if pro ,e,n alky| a ,i on and reduction 'hal no! ? aSmu,% l cn " 
prote.n d,,es„on. peptide sequence coverage mav L Z S , CnakCn Priw * 
m»«s present represents disulf.de *ontoj^%£?? *° W »• —e 
« Monz a a!.. I 904 , For eukan-o.es. a serious^^ , ,k " *" ihe pr ° ,em 

h >- thf ce of post-tranManonai ^S^^^fTT^^^ 

unmodified pept.de alone can he ven- di^o^SlL t ' ^ ^ ° f ,he 
cat.ons , mroducsd h> electrophoresis. 

ox.da.ion of mcth.on.ne. are also known io aS™Jf '° CyMeine and the 

Hess J901,. Kn ° Un,0j,,cr P c P"^muvses,| t .M air( . r ,„ / , 99 ,. 
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A number of computer programs are available for matching o-niid. m-, 
databases ™ c I ,n Conrell. 1994,. Ma,hi„ : , ^E£^g£ 
me manner, whereby peaks of mass 500.3000 Da are selened and m*££Z£ 
vanous search parameters mcluding MW of proietn. mass accuracy of prp llde " and 
number of m.ssed enzyme cleavages allowed < Henz.-I a t,L 1 99 v M Prl2 ,| , ° 
Rasmussen „ „/.. ] 994 ,. The correct protem idenmy is the pr ote.n which ha, the mo,' 
peptide masse* .n common with the unknown sample. , demil:;< ^ ^ J 
w,th a, few a, three peptides but unambiguous .demificancn i, thou,ht , , rtuunt 
mass apenremeinc map covins most pept.des of the pro . ein , Mor ^ 2 q ^. J 
^ates „ ,/ 1993, To date pept.de mass fingerprinting of pro^n L £n" 
undertake from the human myocardial protein and kerat.nocvte maps from anV / 
:-D eel. and from reference maps of 5,,/r,,,,/,,,,,,, ntelhtrnim and °ht oifloxmu 
"7 al :Z tS " ll0ne ' aL ,995 -R«"»»«»««/.. 1994:Henzelr,«/.. !99VC ir dv 
" ^ ^*inger,, «/:. 1995). although the technique is most power* to " 

y; e 94 , ^r:;7 9 \>r ther pro,ein idcmmc »™ *^ <*~n « „ 
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MASS SPECTROMETRY SEQUENCE TAGGING 

An extension of peptide mass fingerprinting has recently been described called 
pept.de sequence tagging (Mann and Wilm. 1994: Mann/1995). Th,, use, tandem 
mass spectrometry ( MS/MS » to initially determine the mass of peptides, then suhicc 
them to fragmentation by collision with a gas. and finally determine the nu. of 
fraemems-The resulting spectra gives information about a peptides amino and 
sequence. The fragmentation masses of pept.de.scan rarely be used toassi, n a complete 
sequence but it usually allows a short sequence tag" of 2 or 3 amino acid, to be 
determined. This sequence tag and the original peptide mass j, matched bv computer 
agamstadatabase^ 

The major drawback for th.s technique as a mass screening tool N the complexity of the 
mass data generated and the high level of expertise required for ,„ £££££ 
Nevertheless. „ represents a useful new prote.n identification method u-hiTh e y 
increases the power of peptide mass fingerprinting protein identification. * * 

Cross-species protein identification 

Pro,e,n sequence databases continue to grow a, a rapid rate. , ei „ nol vvidc)v 
appreciated that close to 907, of all information contained in current P rn,,,n d;,u,h, 
comes from onh .0 species , A. Bairoch. Per,. Comm. ,,Fonuna,eh ,h„ 
can he u,ed to study proteomes of organism* tha, are poorh denned a, the n cc Z 
7 00- u C,CC,r0ph0r "' S and -oss-speces- prote.n .denization ,Cor " 
« .. 199, VI as.ngcr 1995). This approach allows prote.n, from reference m "n 
o many Afferent speces ,o be identified without the need for the c orre, on " 
io be cloned and sequenced. This is particularly true for -housekccp.no- in\\ ! h 
; , enzymes involved ,n g.yco.ys.s. DNA manipulation and prote n Z C l ^ 
which are highly conserved across speces boundar.es. Proteins lha , canno be 
.denufied across species boundaries can then become the focus 0 f funher n ^ 
character.sat.on and DNA sequencing efforts. P 
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Rapid cross-species identification of pmteins 'rom :-D reference man- k, 
undertaken w.th amino acid composition or peptide mass fmKrpriniincmeUm u 
. Figure 61. but these -echniques alone ma> noi identify protein, unambieuouslv u hen 
phylosens nc cross- s p Cfc ,es distances are r eat orar.alvsis data is of poor quaJirv ( y™, 
et aL 1993: Shaw. 1993: Corduell „ aL 1 995 1. However, verv high confident .n 
protein identities can be achieved uhen 1ms of besi-matchmc protems ceneratedbv 
both techniques are compared (Corduell et aL J 995: Wasineer et al~ J 995) The 
correct identification is found when the same protein is ranked" hichlv m h>i. 0 f b-*- 
matches generated by both techniques. This method has allowed approximated PO 
proteins from the reference map of the mollicutc Jpimplasma mdlifemm r-presem 
mg approximately one quarter of the proteomc. to be confidently identified bv 
reference to protein information from other species ,S. Cordu ell. Personal Communi- 
cation}. When cross-species protein identification is to be undertaken it should be 
noted that the molecular weight of a protein type across species is usually hichlv 
conserved, but that protein pi can van by more than 2 units (Cordwel! c: a! !99V, 
Accurate molecular weight determination by direct mass spectrometry" of proteins 
blotted to PVDF (Eckerskom et aL 1992) should therefore be a useful additional 
parameter for cross-species protein identification. 

CHARACTERISATION Or POST- TRANSLATION AL MODIFICATIONS 

Many proteins are modified after translation. Such post-translational modifications 
including glycosylauon. phosphorylation, and sulfation (see Table 6). are usually 
necessary for protein function or stability. Some abnormal modifications are associ 
ated with disease (Duthel and Revol. 1993: Ghosh et aL 1993: Yamashita et at 
1993). In proteome studies, post-translational modifications can be examined on all 
proteins pres-nt. or on individual spots. Studies on all proteins provide an indication 
of which proteins may earn a certain type of modification. For example. 2-D ««cl 
analysis of cell cultures grown in the presence of [*H] mannose or TP} phosphate 
sivev an indication of which proteins carry glycanx coniaininc mannose and which 
proteins are phosphon luted (Garrels and Franza. 1989). Lectin binding stud.es of 
eels Honed to PVDF or nitrocellulose prov.de information on the saccharides, if ;mv 
that are earned by proteins present (Gravel et a!.. 1994). * ' 

" When individual protein, of interest earn ing post-translational modifications have 
been found, m.cropreparat.v- :-D electrophoresis can be used to pur.fv them in 
microgram quantities .Hanash ei aL 1991: Bjellqvist et aL 1991b) 'if protein 
■ soforms of similar MW and pi are to be studied, focusinc u„ h narrow ran-e pi 
gradients ( 1 pH unit, can provide greater separation and resolution After electro 
pnores.s. the type and degree of protein phosphorylation can be myesticated (Munhv 
and Iqbal. 1991: Gold et aL 1994,. monosaccharide composition can be determined 
- \\ eitzhandler et aL 1993: Packer et aL 1995). and the structure and exact site of 
glycoammo acids can be investigated b> either Edman dccradation based techniques 
or by mass spectrometry i Pisano et aL 1 993: Hubeny « «/.. 1 993: Carr. Huddleston 
and Bean. 1993.. W„h further development of rapid techniques. inveM.«*ation of 
phosphorylation and monosaccharides by chromatographic or mass specirometric 
means , s likely to become a routine step in the characterisation of post-translational 
modifications of proteins from reference maps. 
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The stattis of proteome projects 

Many technical aspects of proteome research have aJreadv a- 

-eview. I.ut an overview of the statu* of pm«om- proie^ ™ ^ fa lhis 

Advance in proi eome protects wi„ JLy2-2^™^'^ 

nitiatives. l0 enable an identity. am.no aci mTS^^Z 

each pro;ein spot. Table 7 show, genome s,ze p^ieol s « " d ^ ""^ '° 

protein* already defined for a numter of model orST* "S ?! , BUmber ° f 

pnorie fencing programs for£. and £ S^J?^^ *" whi, « 

*i» of ome other genomes .and espeeiallv the hu ' " ™' dxancc(U °e 

comple, lt01ld: , iqus ^^i^^™* 

Wasinger ,995: Vanbogeien „*/.. 199^"^^" '»* 
map, of other organisms will take loneer to const™ Howe'v .nt 
<pcc.es protein identification technique wiii ajlow^Loml of ' " 

and s,mple emotes t0 be p^^ad n^^*^^ 

Specie Name 



A/ \ t npiQMtio ipci : ics 
Escherichia coh 
Sarchninnntr* cerrviuue 
r>irr\nxteittitr dtu tntiriini 
Aiethn/ttpxts tnaiuinc 
CttattuhtitHt/tt* cirvrw\ 



Haploid 
cennmcsSize 
(million bp» 

0.6-0.8 
4.8 
13.5 
70 
70 
80 

:voo 



Estimated 
proicomc %m 
unul proteins ) 

41XU600 

jooo 

I2.MO 
IJ(XK) 
17X00 



Protein 
enirie* in 
SWISS PROT 

MX) 
3170 
3160 

:~n 

352ft 



annotated on 
2-D Map* 

> MX) 

> 3f JO 

> 1IX) 



> 1000 



The study of vertebrate proteome* and vertebrate develonm-m ,< , «k 
undertone ,n comparison ,o the mvest.ganon of „ „ B 1 ^ n £ 'T^' 
because vaM numbers of proteins are developmentallvrxnr- ^J 1 "'^ Tn.s „ 
hundred, of un, q ue protems. and there S^^^ hII^ 
M.ma.ed tha, a, least 35* of proteins in venebrate celK w,|, be ^r^ Z " ,S 
■o ..ssue. constituting the 'housekeeping proteins , Bird 1 99S vvhh , " C 
Protems constituting a set that are specific to a eel , r* ' , L ^'"^ ° f 
elecrophoretic conditions are U sed . refere^C ^"^T^ 
i*m can be superimposed in »cl datab:,^ * u T " ° nc or - Ban - 

-e,e ra ,, <lh;drfin ,!; onof , h; -t:-^ ;; 0 ," h ;:r,r.:;,:v 99: ' Th ^ 

w unique 10 difftrc, ti« ut , vp . s Such MudL ^ hi '«<of prmeiiKthai 
u«ful ,„ providing foe. f or „ ucltic lcjd sequencing f,Kcllu ' c - «» 
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FLTL'RE DIRECTIONS OF PROTDMfc PROJECTS 

This review ha> described recent advances in th- ar- 3 nf wn ,« 
...u.ratedho* newdeve,o P men, ofoldei ~ « ^ 

™»*na>ysi*.a^^ 

creatly widened the chotce of tool, the bioloeis, and protem I ch7n^ V J 
reparation, identif.cat.on and analys,s of compfcx mixlw Z ? p 0 ^T h " h. ' t 
poss.ble tne establishment of bailed reference map, for o ^!n" rm uh.-h 
becom.ng the method of cho.ee for the def.ni.ion of tLes o7^o e cl| , 7 
investigation of gene expression there.n. lv and thtf 

Proieome projects are aireadv impacti:^ on th- .w.*-, «r — . 

of different tissues of a s,nf le or, anism are ofte'n J^fic^S; 
crt»«.speeie« idemm.-at.on of prote.ns (for e«™£T r S "»«*>.v. 
fro. C«** „*,™, oy confp^ ^ ^ ^'1°' T"*' 

orfamsms ,„a, m poorly molecular!,- defined. As « ««pcc^Jr °" 
proceed a. a paC e orders of magnitude faster ih™ '*™'fiKUion can 
defintne the «ne and prote.n C^ft^ ^ ° f 

r _enc,n ? of S cnomes » i„ * avoided, and «^L^« ^ 

. Bah 

Just as genome sequencing is noi an end ,n iuelf. neither is an annotated nm. ■ 
«ieren« map o: an organtsm. nor indeed the .denttf.cat.on of prote^m 
So whtlst an tmmedtate aim of proieome projects is to screen pro e?n ^in r °fl r 

maps, tn.s will lead to expression studies and characterisation of Z „ . " Bar 
modifications. The challenge that then needs to be^S^^"T^' 
structure and funct.on of prote.ns in a proteome. The ma^e oYm s "i m 5 
•he fac, ,ha, over half the open readtng frames ^^^Z^™ ^ 
HI -ere .n.iially of no known function (Oliver,;,/.. ,99-, S rn^a and f ' 

-dies u,„ be an undertone just as formidable J^^^f^^"^ ' Ber- 
proteome protects are becomine but will l ea rf ,n , M ' ome sl i ud,e ^ are now and 
«■»*., of no. „v,„ ? o,eani.rn-; £ Zl^Z , . 
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Differential gene expression in drug metabolism and 
toxicology: practicalities, problems and potential 
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Received January 8, 1999 

1 . An important feature of the work of many molecular biologists is identifying which 
genes are switched on and off in a cell under different environmental conditions or 
subsequent to xenobiotic challenge. Such information has many uses, including the 
deciphering of molecular pathways and facilitating the development of new experimental 
and diagnostic procedures. However, the student of gene hunting should be forgiven for 
perhaps becoming confused by the mountain of information available as there appears to be 
almost as many methods of discovering differentially expressed genes as there are research 
groups using the technique. 

2. The aim of this review was to clarify the main methods of differential gene expression 
analysis and the mechanistic principles underlying them. Also included is a discussion on 
some of the practical aspects of using this technique. Emphasis is placed on the so-called 
' open ■ systems, which require no prior knowledge of the genes contained within the study 
model. Whilst these will eventually be replaced by 'closed' systems in the study of human, 
mouse and other commonly studied laboratory animals, they will remain a powerful tool for 
those examining less fashionable models. 

3. The use of suppression- PGR subtractive hybridization is exemplified in the 
identification of up- and down -regulated genes in rat liver following exposure to pheno- 
barbital, a well-known inducer of the drug metabolizing enzymes. 

4. Differential gene display provides a coherent platform for building libraries and 
microchip arrays of 'gene fingerprints ' characteristic of known enzyme inducers and 
xenobiotic toxicants, which may be interrogated subsequently for the identification and 
characterization of xenobiotics of unknown biological properties. 



Introduction 

It is now apparent that the development of almost all cancers and many non- 
neoplastic diseases are accompanied by altered gene expression in the affected cells 
compared to their normal state (Hunter 1991, Wynford-Thomas 1991, Vogelstein 
and Kinzler 1993, Semenza 1994, Cassidy 1995, Kleinjanand Van Hegningen 1998). 
Such changes also occur in response to external stimuli such as pathogenic micro- 
organisms (Rohn et al. 1996, Singh et al. 1997, Griffin and Krishna 1998, Lunney 
1998) and xenobiotics (Sewall et aL 1995, Dogra et aL 1998, Ramana and Kohli 
1998), as well as during the development of undifferentiated cells (Hecht 1998, 
Rudin and Thompson 1998, Schneider-Maunoury et aL 1998). The potential 
medical and therapeutic benefits of understanding the molecular changes which 
occur in any given cell in progressing from the normal to the Altered* state are 
enormous. Such profiling essentially provides a /.fingerprint ' of each step of a 
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cell's development or response and should help in the elucidation of specific and 
sensitive biomarkers representing, for example, different types of cancer or previous 
exposure to certain classes of chemicals that are enzyme inducers. 

In drug metabolism, many of the xenobiotic-metabolizing enzymes (including 
the well-characterized isoforms of cytochrome P450) are inducible by drugs and 
chemicals in man (Pelkorien et al. 1998), predominantly involving transcriptional 
activation of not only the cognate cytochrome R450 genes, but additional cellular 
proteins which may be crucial to the phenomenon of induction. Accordingly, the 
development of methodology to identify and assess the full complement of genes 
that are either up- or down- regulated' by inducers are crucial in the development of 
knowledge to understand the precise molecular mechanisms of enzyme induction 
and how this relates to drug action. Similarly, in the field of chemical-induced 
toxicity, it is now becoming increasingly obvious that most adverse reactions to 
drugs and chemicals are the result of multiple gene regulation, some of which are 
causal and some of which arc casually- related to the toxicological phenomenon per 
se. This observation has led to an upsurge in interest in gene-profiling technologies 
which differentiate between the control and toxin-treated gene pools in target tissues 
and is, therefore, of value in rationalizing the molecular mechanisms of xenobiotic- 
induced toxicity. Knowledge of toxin-dependent gene regulation in target tissues is 
not solely an academic pursuit as much interest has been generated in the 
pharmaceutical industry to harness this technology in the early identification of toxic 
drug candidates, thereby shortening the developmental process and contributing 
substantially to the safety assessment of new drugs. For example, if the gene profile 
in response to say a testicular toxin that has been well-characterized in vivo could be 
determined in the testis, then this profile would be representative of all new drug 
candidates which act via this specific molecular mechanism of toxicity, thereby 
providing a useful and coherent approach to the early detection of such toxicants. 
Whereas it would be informative to know the identity and functionality of all genes 
up/down regulated by such toxicants, this would appear a longer term goal, as the 
majority of human genes have not yet been sequenced, far less their functionality 
determined. However, the current use of gene profiling yields a pattern of gene 
changes for a xenobiotic of unknown toxicity which may be matched to that of well- 
characterized toxins, thus alerting the toxicologist to possible in vivo similarities 
between the unknown and the standard, thereby providing a platform for more 
extensive toxicological examination. 5uch approaches are beginning to gain 
momentum, in that several biotechnology* companies are commercially producing 
'gene chips' or 'gene arrays* that may be interrogated for toxicity assessment of 
xenobiotics. These chips consist of hundreds/ thousands of genes, some of which are 
degenerate- in the sense that not all of the genes are mechanistically- related to any 
one toxicological phenomenon. Whereas these chips are useful in broad -spectrum 
screening, they are maturing at a substantial rate, in that gene arrays are now 
becoming more specific, e.g. chips for the identification of changes in growth factor 
families that contribute to the aetiology and development of chemically-induced 
neoplasias. 

Although documenting and explainmg~ these genetic changes presents a 
formidable obstacle to understanding the different mechanisms of development and 
disease progression, the technology is now avmlable-to begin attempting this difficult 
challenge. Indeed, several 'differential expression analysis' methods hav be n 
developed which facilitate the identification of gen products that demonstrate 
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altered expression in cells of one population compared to another. These methods 
have been used to identify differential gene expression in many situations, including 
invading pathogenic microbes (Zhao et aL 1 998), in cells responding to extracellular 
and intracellular microbial invasion (Duguid and Dinauer 1990, Ragno et aL 1997, 
Maldarelli et aL 1998), in chemically treated cells (Syed et aL 1997, Rockett et aL 
1999), neoplastic cells (Liang et aL 1992, Chang and Terzaghi-Howe 1998), 
activated cells (Gurskaya et aL 1996, Wan et aL 1996), differentiated cells (Hara et 
aL 1991, Guimaraes et aL 1995a, b), and different cell types (Davis et aL 1984, 
Hedrick et aL 1984, Xhu et aL 1998). Although differential expression analysis 
technologies are applicable to a broad range of models, perhaps their most important 
advantage is that, in most cases, 'absolutely no prior knowledge of the specific genes 
which are up- or down- regulated is required. 

The field of differential expression analysis is a large and complex one, with 
many techniques available to the potential user. These can be categorized into 
several methodological approaches, including: 

(1) Differential screening, 

(2) Subtractive hybridization (SH) (includes methods such as chemical cross- 
linking subtraction — CCLS, suppression-PCR subtractive hybridization — 
SSH, and representational difference analysis — RDA), 

(3) Differential display (DD), 

(4) Restriction endonuclease facilitated analysis (including serial analysis of gene 
expression — SAGE — and gene expression fingerprinting — GEF), 

(5) Gene expression arrays, and 

(6) Expressed sequence tag (EST) analysis. 

The above approaches have been used successfully to isolate differentially 
expressed genes in different model systems. However, each method has its own 
subtle (and sometimes not so subtle) characteristics which incur various advantages 
and disadvantages. Accordingly, it is the purpose of this review to clarify the 
mechanistic principles underlying the main differential expression methods and to 
highlight some of the broader considerations and implications of this very powerful 
and increasingly popular technique. Specifically, we will concentrate on the so- 
called *x>pen' systems, namely those which do not require any knowledge of gene 
sequences and, therefore, are useful for isolating unknown genes. Two * closed* 
systems (those utilising previously identified gene sequences). EST analysis and the 
use of DNA arrays, will aiscr be considered briefly for completeness. Whilst 
emphasis will often be placed on suppression PCR subtractive hybridization (SSH, 
the approach employed in this laboratory), it is the aim of the authors to highlight, 
wherever possible, those areas of common interest to those who use, or intend to use, 
differential gene expression analysis. •-*-■ - 



Differential cDNA library screening (DS) 

Despite the development of multiple technological advances which have recently 
brought the field of gene expression profiling to the forefront of molecular analysis, 
recognition of the importance of differential gene expression and characterization of 
differentially expressed genes has existed for many years. One of the original 
approaches used to identify such genes was described 20 years ago by St John and 
Davis (1979). These authors developed a method, termed 'differential plaque filter 
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hybridization \ which was used to isolate galactose-inducible DNA sequences from 
yeast. The theory is simple: a genomic DNA library is prepared from normal, 
unstimulated cells of the test organism/tissue and multiple filter replicas are 
prepared. These replica blots are probed with radioactively (or otherwise) labelled 
complex cDNA probes prepared from the control and test cell mRNA populations. 
Those mRNAs which are differentially expressed in the treated cell population will 
show a positive signal only on the filter probed with cDNA from the treated cells. 
Furthermore, labelled cDNA from different test conditions can be used to probe 
multiple blots, thereby enabling the identification of mRNAs which are only up- 
regulated under certain conditions. For example, St John and Davis ( 1 979) screened 
replica filters with acetate-, glucose- and galactose-derived probes in order to obtain 
genes induced specifically by galactose metabolism. Although groundbreaking in its 
time this method is now considered insensitive and time-consuming, as up to 2 
months are required to complete the identification of genes which arc differentially 
expressed in the test population. In addition, there is no convenient way to check 
that the procedure has worked until the whole process has been completed. 

Subtractive Hybridization (SH) 

The developing concept of differential gene expression and the success of early 
approaches such as that described by St John and Davis (1979) soon gave rise to a 
search for more convenient methods of analysis. One of the first to be developed was 
SH, numerous variations of which have since been reported (see below). In general, 
this approach involves hybridization of mRNA/cDNA from one population (tester) 
to excess mRNA/cDNA from another (driver), followed by separation of th 
unhybridized tester fraction (differentially expressed) from the hybridized common 
sequences. This step has been achieved physically, chemically and through the use 
of selective polymerase chain reaction (PCR) techniques. 

Physical separation 

Original subtractive hybridization technology involved the physical separation 
of hybri&ized common species from unique single stranded species. Several methods 
of achieving this have been described, including hydroxyapante chromatography 
(Sargent and Dawid 1983), avidin-biotin technology (Duguid and Dinauer 1990) 
and oligodT-latex separation (Hara et al. 1991). In the first approach, common 
mRNA species are removed by cDNA (from test cells)-mRNA (from control c Us) 
subtractive hybridization followed by hydroxy apatite chromatography, as hydroxy- 
apatite specifically adsorbs the cDNA-mRNA hybrids. The unabsorbed cDNA is 
then used_either for the construction-of a cDN A library of differentially expr ss d 
genes (Sargent and Dawid 1983, Schneider et a/. 1988) or directly as a probe to 
screen a preselected library (Zimmerman et aL 1980, Davis et al. 1984, Hedrick et al. 
1984). A schematic diagram of the procedure is shown in figure 1. 

Less rigorous physical separation procedures coupled with sensitivity enhancing 
PCR steps were later developed as a means to overcome some of the problems 
encounter d with the hydroxyapatite procedure. For example, Daguid and Dinauer 
(1990) described a method of subtraction utilizing biotin-affinity systems as a means 
to remove hybridized common sequences. In this process, both the control and 
tester mRNA populations are first converted to cDN A and an adaptor ( f oligovector ' , 
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Figure 1. The hydroxyapatite method of subtractive hybridization. cDNA derived from the 
treated /altered (tester) population is mixed with a large excess of mRNA from the control idnver* 
population. Following hybridization. mRNA-cDNA hybrids are removed by hydroxyapatite 
chromatography. The only cDNAs which remain are those which are dinerentiaily expressed in 
the treated/ altered population. In order to facilitate the recovery of full length clones, small cDNA 
fragments are removed by exclusion chromatography. The remaining cDNAs are then cloned into 
a vector for sequencing, or labelled and used directly to probe a library, as described bv Sargent 
and Dawid (1983). 

containing a restriction site) ligated to both sides. Both populations are then 
amplified by PCR, but the driver cDNA population is subsequently digest d with 
the adaptor-containing restriction endonuclease. This serves to cleave the oligo- 
vector and reduce the amplification potential of the control population. The dig st d 
control population is then biotinylated and an excess mixed with tester cDNA. 
Following denaturation and hybridization, the mix is applied to a biocytin column 
(streptavidin may also be used) to remove th "control population, including 
heteroduplexes formed by annealing of common sequences from the tester 
population. The procedure is repeated several times following the addition of fresh 
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Figure 2. The use of oligodT M latex to perform subtractive hybridization. mRNA extracted from the 
control (driver) population is converted to anchored cDNA using polydT oligonucleotides 
attached to latex beads. mRNA from the treated/altered (tester) population is repeatedly 
hybridized against an excess of the anchored driver cDNA. The final population of mRNA is 
tester specific and can be converted into cDNA for cloning and other downstream applications, as 
described by Hara ef a/. (1991). 
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control cDNA. In order to further enrich those species differentially expressed in 
the tester cDNA, the subtracted tester population is amplified by PCR following 
every second subtraction cycle. After six cycles of subtraction (three reamplification 
steps) the reaction mix is ligated into a vector for further analysis. 

In a slightly different approach, Hara et aL (1991) utilized a method whereby 
oligo(dT 30 ) primers attached to a latex substrate are used to first capture mRNA 
extracted from the control population. Following 1st strand cDNA synthesis, the 
RNA strand of the heteroduplexes is removed by heat denaturation and centri- 
fugation (the cDNA-oligotex-dT M forms a pellet and the supernatant is removed). 
A quantity of tester mRNA is then repeatedly hybridized to the immobilized control 
(driver) cDNA (which is present in 20-fold excess). After several rounds of 
hybridization the only mRNA molecules left in the tester mRNA population are 
those which are not foundjn the driver cDNA-oligotex-dT 30 population. These 
tester-specific mRNA. species are then converted to cDNA and, following the 
addition of adaptor sequences, amplified by PCR. The PCR products are then 
ligated into a vector for further analysis using restriction sites incorporated into the 
PCR primers. A schematic illustration of this subtraction process is shown in figure 

2. . 

However, all these methods utilising physical separation have been described as 
inefficient due to the requirement for large starting amounts of mRNA, significant 
loss of material during the separation process and a need for several rounds of 
hybridization. Hence, new methods of differential expression analysis have recently 
been designed to eliminate these problems. 
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Chemical Cross-Linking Subtraction (CCLS) 

In this technique, originally described by Hampson et aL (1992), driver mRNA 
is mixed with tester cDNA (1st strand only) in a ratio of > 20:1. The common 
sequences form cDNA .mRNA hybrids, leaving the tester specific species as single 
stranded cDNA. Instead of physically separating these hybrids, they are inactivated 
chemically using 2,5 diaziridinyl- 1 ,4-benzoquinone (DZQ). Labelled probes are 
then synthesized from the remaining single stranded cDNA species (unreacted 
mRNA species remaining from the driver are not converted into probe material due 
to specificity of Sequenase T7 DNA polymerase used to make the probe) and used 
to screens cDN A library made from the tester cell population. A schematic diagram 
of the system is shown in figure 3. 

It has been shown that the differentially expressed sequences can be enriched at 
least 300-fold with one round of subtraction (Hampson et aL 1992), and that the 
technique should allow isolation of cDNAs derived from transcripts that are present 
at less than 50 copies per cell. This equates to genes at the low end of intermediate 
abundance (see table 1). The main advantages of the CCLS approach are that it is 
rapid, technically simple and also produces fewer false positives than other 
differential expression analysis methods. However, like the physical separation 
protocols, a major drawback with CCLS is the large amount of starting material 
required (at least 10 /ig RNA). Consequently, the technique has recently been 
refined so that a renewable source of RNA can be generated. The degenerate random 
oligonucleotide primed (DROP) adaptation (Hampson et aL 1996, Hampson and 
Hampson 1997) uses random hexanucleotide sequences to prime solid phase- 
svnthesized cDNA. Since each primer includes a T7 polymerase promotor sequence 
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Control (driver) mRNA 



-AAAA 
-AAAA 



mRNAxDNA hybrids 
Unique cONA species 



Test (tester) mRNA 



-AAAA 
••AAAA 



1st strand cONA synthesis i 
followed by alkaline hydrolysis I 



Mix and anneal 



T 



-AAAA 

rr 

I I 



^ Cross linking agent 



(DZQ) added 



Hybrids are cross-linked xxxxxxxxx 



AAAA 




Probes synthesised from single stranded cDNA 
species and used to probe cDNA library 

Figure 3. Chemical cross-linking subtraction. Excess driver mRNA is mixed with X n strand tester 
cDNA. The common seque n ces torm mRNA ; cDNA hybrids which are cross linked with 2.5 
diaxu idinyf-t .4-benzoquuiuiie (DZQ) and the remaining cDNA sequences are dirTerenuallv 
expressed in the tester population. Probes are made from these sequences using Sequenase 2.0 
DNA polymerase, which lacks reverse transcriptase activity and. therefore, does not react with the 
remaining mRNA molecules from the driver. The labelled probes ace then used to screen a cDNA 
library for clones of differentially expressed sequences* Adapted from Walter et al. (1996), with 



Table I. The abundance of mRNA species and classes in a typical mammalian cell. 
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species/pg 


class 


species/cell 


class 


in class 


total RNA 
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4 


3.3 
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: Modified from Bertioli et al. (1995). - ™rr tts^ 
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at the 5' end, the final pool of random cDNA-fragments is a PCR-renewable cDNA 
population which is representative of the expressed gene pool and can be used to 
synthesize sense RNA for use as driver material. Furthermore, if the final pool of 
random cDNA fragments is reamplified using biotinylated T7 primer and random 
hexamer, the product can be captured with streptavidin beads and the antisense 
strand eluted for use as tester. Since both target and driver can be generated from 
the same DROP product, subtraction can be performed in both directions (i.e. for 
up- and down-regulated species) between two different DROP products. 

Representational Difference Analysis (RDA) 

RDA of cDNA (Hubank and Schatz 1994) is an extension of the technique 
originally applied to genomic DNA as a means of identifying differences between 
two complex genomes (Lisitsyn et aL 1993). It is a process of subtraction and 
amplification involving subtractive hvbridization of the tester m the presence of 
excess driver. Sequences in the tester that have homologues in the driver are 
rendered unamplifiable, whereas those genes expressed only in the tester retain the 
ability to be amplified by PCR. The procedure is shown schematically in figure 4. 

In essence, the driver and tester mRNA populations are first converted to cDN A 
and amplified by PCR following the ligation of an adaptor. The adaptors are then 
removed from both populations and a new (different) adaptor ligated to the 
amplified tester population only. Driver and tester populations are next melted and 
hybridized together in a ratio of 100: 1. Following hybridization, only tester : tester 
homohybrids have 5' adaptors at each end of the DNA duplex and can, thus, be filled 
in at both 3' ends. Hence, only these molecules are amplified exponentially during 
the subsequent PCR step. Although tester : driver heterohybrids are present, they 
only amplify in a linear fashion, since the strand derived from the driver has no 
adaptor to which the primer can bind. Driver: driver heterohybrids have n 
adaptors and, therefore, are not amplified. Single stranded molecules are digested 
with mung bean nuclease before a further PCR-enrichment of the tester : tester 
homohybrids. The adaptors on the amplified tester population are then replaced and 
the whole process repeated a further two or three times using an increasing excess of 
driver .(Hubank and Shatz used a tester : driver ratio of 1:400, 1:80000 and 
1:800000 for the second, third and fourth hybridizations, respectively). Different 
adaptors are ligated to the tester between successive rounds of hybridization and 
amplification to prevent the accumulation of PCR products that might interfere with 
subsequent amplifications. The final display is a series of differentially expressed 
gene products easily observable on an ethidium bromide gel. 

The main advantages of RDA are that it_ offers_a reproducible and sensitive 
approach to the analysis of differentially expressed genes. Hubank and Schatz (1994) 
reported that they were able to isolate genes that were differentially expressed in 
substantially less than 1 % of the cells from which the tester is derived. Perhaps the 
main drawback is that multiple rounds of ligation, hybridization, amplifiation and 
digestion are required. The procedure is, therefore, lengthier than many other 
differential display approaches and provides more opportunity for operator-induced 
error to occur Although the generation of false" positives has been noted, this has 
been solved'to some degree by O'Neill and Sinclair (1997) through the use of HPLC- 
purifi d adaptors. These are free of the truncated adaptors which appear to be a 
major source of the false positive bands. A very similar technique to RDA, termed 
linker capture subtracti n (LCS) was described by Yang and Sytowski (1996). 
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ds control (driver) cDNA ds test (tester) cDNA 



\ Digest with restriction enzyme \ 



I Ligateto I 

v dephosphorylated v 



12/24 adaptor 
strands 



1 Melt 12mer I 



Fill in 3* ends (Taq), add 

primer { ) and 

amplify 



Digest 



Digest and ligate 
new 12/24 adaptor 



Mix 100:1, melt and hybridize 



T 




Fill in ends, add primer ( — ) and amplify 

I 1 I 

Linear amplification Exponential amplification No amplification 

i 

Digest PCR products with mung Dean nudease to remove 
ssDNA moiecules present after amplification 

i 

_ _ First difference 

Figure 4. The representational difference analysis (RDA) technique. Driver and tester cDNA are 
digested with a 4-cutter restriction enzyme such as Dpnll. The 1* set of 12/24 adaptor strands 
(oligonucleotides) are ligated to each other and the digested cDNA products. The 12mer is 
subsequently melted away and the 3 'ends filled in using Taq DNA polymerase. Each cDNA 
population is then amplified using PCR, following which the 1* set of adaptors is removed with 
Dpnll. A second set of 12/24 adaptor strands is then added to the amplified tester cDNA 
population, after which the tester is hybridized against ~a~ Far ge~excess of driver. The 12mer 
adaptors are melted and the 3' ends filled in as before. -PCR is carried out with primers identical 
to the new 24mer adaptor. Thus, the only hybridization products which axe exponentially 
amplified are those which are tester : tester combinations. Following PCR, ssDNA products are 
removed with mung bean nuclease, leaving the 'first difference product*. This is digested and a 
third set of 12/24 adaptors added before repeating the subtraction process from the hybridization 
stage. The process is repeated to the 3 rt or 4 th difference product, as described by Ltsitsyn et al. 
(1993) and Hubank and Schatz ( 1994). '-' - - '-- - - 
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Suppression PCR Subtr active Hybridization (SSH) 

The most recent adaptation of the SH approach to differential expression 
analysis was first described by Diatchenko et al. (1996) and Gurskaya et aL (1996). 
They reported that a 1000-5000 fold enrichment of rare cDNAs (equivalent to 
isolating mRNAs present at only a few. copies per cell) can be obtained without the 
need for multiple hybridizations/subtractions. Instead of physical or chemical 
removal of the common sequences, a PCR-based suppression system is used (see 
figure 5). 

In SSH, excess driver cDNA is added t o two portions of the tester cDNA which 
have been ligated with different ^adaptors. A first round of hybridization serves to 
enrich differentially expressed genes and equalize rare and abundant messages. 
Equalization occurs since reannealing is more rapid for abundant molecules than for 
rarer molecules due to the second order kinetics of hybridization (James and Higgins 
1 985). The two primary hybridization mixes are then mixed together in the presence 
of excess driver and allowed to hybridize further. This step permits the annealing of 
single stranded complementary sequences which did not hybridize in the primary 
hybridization, and in doing so generates templates for PCR amplification. Although 
there are several possible combinations of the single stranded molecules present in 
the secondary hybridization mix, only one particular combination (differentially 
expressed in the tester cDNA composed of complimentary strands having different 
adaptors) can amplify exponentially. 

Having obtained the final differential display, two options are available if cloning 
of cDNAs is desired. One is to transform the whole of the final PCR reaction into 
competent cells. Transformed colonies can then be isolated and their inserts 
characterized by sequencing, restriction analysis or PCR. Alternatively, the final 
PCR products can be resolved on a gel and the individual bands excised, reamplified 
and cloned. The first approach is technically simpler and less time consuming. 
However, ligation/transformation reactions are known to be biased towards the 
cloning of smaller molecules, and so the final population of clones will probably not 
contain a representative selection of the larger products. In addition, although 
equalization theoretically occurs, observations in this laboratory suggest that this is 
by no means perfectly accomplished. Consequently, some gene species are present 
in a higher number than others and this will be represented in the final population 
of clones. Thus, in order to obtain a substantial proportion of those gene species that 
actually demonstrate differential expressiorrin the tester population, the number of 
clones that will have to be screened after this step may be substantial. The second 
approach is initially more time consuming and technically demanding. However, it 
would appear to offer better prospects for cloning larger and low abundance gel 
products. In addition, one can incorporate" a~ screening step that differentiates 
different products of different sequences but of the same size (HA-staining, see 
later). In this way, a good idea of the final number of clones to be isolated and 
identified can be achieved. 

An alternative (or even complementary) approachls to use the final differential 
display reaction to screen a cDNA library to isolate full length clones for further 
characterization, or a DNA array (see later) to quickly identify known genes. SSH 
has been used in this laboratory t begin characterization of the short-term gene 
expression profiles of enzyme-inducers such as phenobarbital (Rockett et al. 1997) 
and Wy- 14,643 (Rockett et aL unpublished observations). The isolation of 
differentially expressed genes in this manner enables the construction of a fingerprint 
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Tester cONA with adaptor 1 



Driver cDNA 
(in excess) 



Tester cDNA with adaptor 2 




Mix samples, add fresh denatured driver, anneal 



T 



a,b,c,d & e 






Add primers and 
^ ampay-by PCR 

a, d no amplification 

b no amplification - suppressed due to 
formation of panhandle structure 
. c linear amplification - 

e exponential amplification 

Figure 5. PCR-selcct cDNA subtraction. In the primary Jjybridization, an excess of driver cDNA is 
added to each tester cDNA population. The sampleaare heat denatured and allowed to hybridize 
for between 3 and 8 h. This serves two purposes : ( 1 ) to equalize rare and abundant molecules ; and 
(2) to ennch for differentially expressed sequences— cDNAs.that are not differentially expressed 
form type c molecules with the driver. In the secondary hybridization, the two primary 
hybridizations are mixed together without denaturing. Fresh denatured driver can also be added 
at this point to allow further enrichment of differentially expressed sequences. Type e molecules 
are formed in this secondary hybridization which are subsequently amplified using two rounds of 
PCR. The final products can be visualized on an agarose geljabelled directly or cloned into a 
vector for downstream manipulation. As described by Diatchenko et al. (1996) and Gurskaya 

. ._ et at. (1996). with permission. . _ 
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Figure 6. Flow diagram showing method used in this Laboratory to isolate and i dentin* ciones 01 eenes 
which are differentially expressed in rat liver following short term exposure to the enzyme 
inducers, phenobarbital and Wy-14,643. 

of expressed genes which are unique to each compound and time/dose point. Such 
information could be useful in short-term characterization of the toxic potential of 
new compounds by comparing the gene-expression profiles they elicit with those 
produced by known inducers. Figure 6 shows a flow diagram of the method used to 
isolate, verify and clone differentially expressed genes, and figure 7 shows expression 
profiles obtained from a typical SSH experiment. Subsequent sub-cloning of the 
individual bands, sequencing and gene data base interrogation reveals many genes 
which are either up- or down-regulated by phenobarbital in the rat (tables 2 and 3). 

One of the advantages in using the SSH approach is that no prior knowledge is 
required of which specific genes are up/down-regulated subsequent to xenobiotic 



668 



. J. C. Rockett et al. 




4 r. 



Figure 7. SSH display patterns obtained from rat liver following 3-day treatment with WY- 14.643 or 
phenobarbital. mRNA extracted from control and treated livers was used to generate the 
differential displays using the PCR-Select cDNA subtraction kit (Clontech). Lane: 1—1 kb 
ladder; 2 — genes upregulated following Wy, 14-643 treatment; 3 — genes downregulated following 
Wy, 14-643 treatment; 4 — genes upregulated following phenobarbital treatment; 5 — genes 
downregulated following phenobarbital treatment; 6— lkb ladder. Reproduced from Rockett et 
al. (1997). with permission. 

exposure, and an almost complete complement of genes are obtained. For example, 
the peroxisome proliferator and non-genotoxic hepatocarcinogen Wy, 14,643, up- 
regulates at least 28 genes and down-regulates at least 15 in the rat (a sensitive 
species) and produces 48 up- and 37 down-regulated genes in the guinea pig, a 
resistant species (Rockett, Swales, Esda and Gibson, unpublished observations). 
One of these genes, CD81, was up-regulated in the rat and down- regulated in the 
guinea pig following Wy- 14,643 treatment. CD81 (alternatively named TAPA-1) is 
a widely expressed cell surface protein which is involved in a large number of cellular 
processes including adhesion, activation, proliferation and differentiation (Levy et 
al. 1998V6ince all of these functions are altered to some extent in the phenomena 
of hepatomegaly and non-genotoxic hepatocarcinogenesis. it is intriguing, and 
probably mechanistically- relevant, that CD81 expression is differentially regulated 
in a resistant and susceptible species. However, the down-side of this approach is 
that the majority of genes can be sequenced and matched to database sequences, but 
the latter are predominantly expressed sequence tags or genes of completely 
unknown function, thus partially obscuring a realistic overall assessment of the 
critical genes of genuine^ biological interest. Notwithstanding the lack of complete 
funtional identification of altered gene expression, such gene profiling studies 
essentially provides a * molecular fingerprint "in response to xenobiotic challenge, 
thereby serving as a mechanistically-relevant platform for further detailed 
investigations. 7~ 

Differ ntial Display (DD) - _ 

Originally described as ' RNA fingerprinting by.arbitrarily primed PCR' (Liang 
and Pardee 1992) this method is now more c rnmonly referred to as 'differential 
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Band number 



(approximate 
size in bp) 


nigncsi sequence 
similarity 


FASTA-EMBL gene identification 


5 (1300) 


93.5 ° 0 


CYP2B1 


7 (1000) 


95.1 °i 


Preproalbumin 






Serum albumin mRNA 


8 (950) 


98.3 % 


NCI-CGAP-Prt H. sapiens (EST) 


10 (850) 


95.7 % 


CYP2B1 


1 1 (800) 


Clone I 94.9% 


CYP2B1 




Clone 2 75.3% 


CYP2B2 


12 (750) 


93.8 % 


TRPM-2 mRNA 






Sulfated glycoprotein 


15 (600) 


92.9% 


Preproalbumin 






Serum albumin mRNA 


16(55) 


Clone 1 95.2% 


CYP2B1 


Clone 2 93.6% 


HaDtogtobulin mRNA partia! alpha 


21 (350) 


99.3 % 


18S, 5J8S & 28S rRNa 



Bands 1— 6, 9, 13, 14, and 17-20 are shown to be false positives by dot blot anaylsis and, therefore, 
are not sequenced. Derived from Rockett et al. (1997). It should be noted that the above genes do not 
represent the complete spectrum of genes which are up-regulated in rat liver *by phenobarbital, but 
simply represents the genes sequenced and identified to date. 

Table 3. Genes down-regulated in rat liver following 3-day exposure to phenobarbital. 
Band number 

(approximate Highest sequence 

size in bp) similarity FASTA-EMBL gene identification 



1 (1500) 




95.3% 


3-oxoacyl-CoA thiolase 


2 (1200) 




92.3% 


Hemopoxin mRNA 


3 (1000) 




91.7% 


Alpha-2u-gtobulin mRNA 


7 (700) 


Clone I 


77.2% 


AI. musculus CI Inhibitor 




Clone 2 


94.5% 


Electron transfer flavoprotein 




Clone 3 


91.0% 


musculus Topoisomerase 1 (Topo 1 ) 


8 (650) 


Clone 1 


86.9% 


Soares 2NbMT A/, musculus (EST) 


Clone 2 


96.2% 


A!pha-2u-globulin (s-type) mRNA 


9(600) 


Clone 1 


86.9% 


Soares mouse NML M. musculus (EST) 


Clone 2 


82.0% 


Soares p3NMF 19.5 M. musculus (EST) 


fo (550) 




73.8% 


Soares mouse NML M. musculus (EST) 


11 (525) 




95.7% 


NCl-CGAP-Prl H. sapiens (EST) 


12 (375) 




100.0% 


Ribosomal protein 


13 (23) 


Clone I 


97.2% 


Sotrcs mouse cmbno NbMEl35 (EST> 




Clone 2 


100.0% 


Fibrinogen B-beta-cnam 




Clone 3 


100.0% 


A po lipoprotein E gene 


14(170) 




96.0% 


Soares p3NMF19.5 M. musculus (EST) 


15 (140) 




97.3% 


Stratagene mouse testis (EST) 


Others: (300) 




96.7% 


R, norvegicus RASP 1 mRNA 


(275) 




93.1% 


Soares mouse mammary gland (EST) 



EST = Expressed sequence tag. Bands 4-6 were shown to be false positives by dot blot analysis and, 
therefore, were not sequenced. Derived from Rockett et al. ( 1 997). It should be noted that the above genes 
do not represent the complete spectrum of genes which are down- regulated in rat liver by phenobarbital, 
but simiply represent* the genes sequenced and identified to date. 
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display compared to the other, are differentially expressed and may be recovered for 
further characterization. One advantage of this system is the speed with which it can 
be carried out — 2 days to obtain a display and as little as a week to make and identify 
clones. 

Two commonly used variations are based on different methods of priming the 
reverse transcription step (figure 8). One is to use an oligo dT with a 2-base * anchor ' 
at the 3'-end, e.g. 5' (dT n )CA 3' (Liang and Pardee 1992). Alternatively, an 
arbitrary primer may be used for 1st strand cDNA synthesis (Welsh et al 1992). 
This variant of RNA fingerprinting has also been called *RAP* (RNA Arbitrarily 
Primed)-PCR. One advantage of this second approach is that PCR products may be 
derived from anywhere in the RNA, including open reading frames. In addition, it 
can be used for mRNAs that are not polyadenylated, such as manv bacterial mRNAs 
(Wong and McClelland 1994). In both cases, following reverse transcription and 
denaturation, second strand cDNA synthesis is carried out with an arbitrary primer 
{arbitrary primers have a single base at each position, as compared to random 
primers, which contain a mixture of all four bases at each position). The resulting 
PCR, thus, produces a series of products which, depending on the system (primer 
length and composition, polymerase and gel system), usually includes 50-100 
products per primer set (Band and Sager 1989). When a combination of different 
dT-anchors and arbitrary primers are used, almost all mRNA species from a cell can 
be amplified. When the cDNA products from two different populations are analysed 
side by side on a polyacrylamide gel, differences in expression can be identified and 
the appropriate bands recovered for cloning and further analysis. 

Although DD is perhaps the most popular approach used today for identifying 
differentially expressed genes, it does suffer from several perceived disadvantages: 

(1) It may have a strong bias towards high copy number mRNAs (Bertioli et al. 
1995), although this has been disputed (Wan et al. 1996) and the isolation of very- 
low abundance genes may be achieved in certain circumstances (Guimeraes et 
al. 1995a). 

(2) The cDNAs obtained often only represent the extreme 3' end of the mRNA 
(often the 3'-untransIated region), although this may not always be the case 
(Guimeraes et al. 1995a). Since the 3' end is often not included in Genbank and 
shows variation between organisms. cDNAs identified by DD cannot always be 
matched with their genes, even if they have been identified. 

(3) The pattern of differential expression seen on the display often cannot be 
reproduced on Northern blots, with false positives arising in up to 70 0/ o of cases 
(Sun et al. 1994). Some adaptations have been shown to reduce false positives, 
including the use of two reverse transcriptases (Sung and Denman 1997), 
comparison of un induced and induced cells over a time course (Burn et al. 1994) 

. and comparison of DDPCR-products from two uninduced and two induced 
lines (Sompayrac et al. 1995). The latter authors also reported that the us of 
cytoplasmic RNA rather then total RNA reduces false positives arising from 
nuclear RNA that is not transported to the cytoplasm. 

Further details of the background, strengths and weaknesses of the DD 
technique'can be obtained' from a revlew^by McUielland" et al. (1996) and from 
articles by Liang et al. (1995) and WanVf al. (1 996)7" ~ 
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-AAAAAAAA 
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-AC 



2"° strand cDNA 
► 



cDNA can now be amplified by PCR using original primer pair 

Figure 8. Two approaches to differential display (DD) analysis. 1 st strand synthesis can be carried out 
either with a polydT u NN primer (where N = G, C or A) or with an arbitrary primer. The use ot" 
different combinations of G, C and A to anchor the first strand polydT primer enables the priming 
of the majority of polyadenytated mRNAs. Arbitrary primers may hybridize at none, one or more 
places along the length of the mRNA. allowing I" strand cDNA synthesis to occur at none, one 
or more points in the same gene. In both cases. 2 nd strand synthesis is carried out with an arbitrary 
primer. Since these arbitrary primers for the 2 nd strand may also hybridize to the 1" strand cDNA 
in**a number of different places, several different 2 nd strand products may be obtained from one 
binding point of the 1 OT strand primer. Following 2 nd strand synthesis, the original set of primers 
is used to amplify the second strand products, with the result that numerous gene sequences are 
amplified. 



Restriction endonuclease-facilitated analysis of gene expression 
Serial Analysis of Gene Expression (SAGE) 

A more recent development in the field of differential display is SAGE analysis 
(Velculescu et aL 1995). This method uses a different approach to those discussed so ' 
far and is based on two principles. Firstly, in more than 95% of cases, short 

-nucleotide sequences ('tags-') of- only- nine or 10 base pairs provide suffici nt 
information to identify their gene of origin. Secondly, concatonation (linking 
together in a series) of these tags allows sequencing of multiple cDNAs within a 
single clone. Figure 9 shows a schematic repres ntation of the SAGE process. In this 
pr cedure, doubl stranded cDNA from the test cells is synthesized with a 
biotinylated polydT primer. Following ^digesti n with a commonly cutting (4bp 

• fee" gnitlori sequence) restriction enzyme (^anchoring enzyme'), the 3' ends of the 
cDN A population are captured with streptavidin beads. The captured population is 
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split into two and different adaptors ligated to the 5' ends of each group. Incorporated 
into the adaptors is a recognition sequence for a type IIS restriction enzyme — one 
which- cuts DNA at a defined distance (< 20 bp) from its recognition sequence. 
Hence, following digestion of each captured cDNA population with the IIS enzyme, 
the adaptors plus a short piece of the captured cDNA are released. The two 
populations are then ligated and the products amplified. The amplified products are 
cleaved with the original anchoring enzyme, religated (concatomers are formed in 
the process) and cloned. The advantage of this system is that hundreds of gene tags 
can be identified by sequencing only a few clones. Furthermore, the number of times 
a given transcript is identified is a quantitative measurement of that gene's 
abundance in the original population, a feature which facilitates identification of 
differentially expressed genes in different cell populations. 

Some disadvantages of SAGE analysis include the technical difficulty of the 
method, a large amount of accurate sequencing is required, biased towards abundant 
mRNAs, has not been validated in the pharmaco/toxicogenomic setting and has 
only been used to examine well known tissue differences to date. 

Gene Expression Fingerprinting (GEF) 

A different capture/restriction digest approach for isolating differentially 
expressed genes has been described by Ivanova and Belyavsky (1995). In this 
method, RNA is converted to cDNA using biotinylated oligo(dT) primers. The 
cDNA population is then digested with a specific endonuclease and captured with 
magnetic streptavidin microbeads to facilitate removal of the unwanted 5' digestion 
products. The use of restricted 3'-ends alone serves to reduce the complexity of the 
cDNA fragment pool and helps to ensure that each RNA species is represented by 
not more than one restriction product. An adaptor is ligated to facilitate subsequent 
amplification of the captured population. PCR is carried out with one adaptor- 
specific and one biotinylated polydT primer. The reamplified population is 
recaptured and the non-biotinylated strands removed by alkaline dissociation. The 
non-biotinylated strand is then resynthesized using a different adaptor-specific 
primer in the presence of a radiolabeled dNTP. The labelled immobilized 3' cDNA 
ends are next sequentially treated with a series of different restriction endonucleases 
and the products from each digestion analysed by PAGE. The result is a fingerprint 
composed of a number of ladders (equal to the number of sequential digests used). 
By comparing test versus control fingerprints, it is possible to identify differentially 
expressed products which can then be isolated from the gel and cloned. The 
advantages of this procedure are that it is very robust and reproducible, and the 
authors estimate that 80-93% of cDNA molecules are involved in the final 
fingerprint. The disadvantage is that polyacrylamide gels can rarely resolve more 
than 300-400 bands7 which compares poorly to" the 1000 or more which are 
estimated to be produced in* an average experiment. The use of 2-D gels such as 
those described by Uitterlinden etaL (1989) and Hatada et aL (1991) may h lp to 
overcome this problem. 

A similar method for displaying restriction endonuclease fragments was later 
described, by Prashar and ' Weiss man (1 996)7 Howeve r, instead of sequential 
digestion of the immobolized 3'- terminal _cDNA fragments, these authors simply 
compared the profiles oL the " control and ^treated—populations without further 
.manipulation. .. .... 
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Figure 9. Serial analysis of gene expression (SAGE) analysis. cDNA is cleaved with an anchoring enzyme 
(AE) and the 3'ends captured using streptavidin beads. The cDNA pool is divided in half and each 
portion Hgated to a different linker; each containing a type IIS restriction site (tagging enzyme, 
TE). Restriction with the type IIS enzyme releases the linker plus a short length of cDNA 
(XXXXX and OOOOO indicate nucleotides of different tags). The two pools of tags are then 
ligated and amplified using linker-specific primers. Following PCR, the products are cleaved with 

the AE. and the; dirSfcs isolated frorn the linkers using PAGE. The ditags are then ligated (during 

which process, concatenization occurs) and cloned into a vector of choice for sequencing. After 
Velculescu tt a/. (1995), with permission.' ~ - . 
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DNA arrays 

'Open* differential display systems are cumbersome in that it takes a great deal 
of time to extract and identify candidate genes and then confirm that they are indeed 
up- or down- regulated in the treated compared to the control tissue. Normally, the 
latter process is carried out using Northern blotting or RT-PCR. Even so. each of 
the aforementioned steps produce a bottleneck to the ultimate goal of rapid analysis 
of gene expression. These problems will likely be addressed by the development of 
so-called DNA arrays (e.g. Gress et al. 1992, Zhao et al. 1995, Schena et al. 1996), 
the introduction of which has signalled the next era in differential gene expression 
analysis. DNA arrays consist of a' gridded membrane or glass 'chips' containing 
hundreds or thousands of DNA spots, each consisting of multiple copies of part of 
a known gene. The genes are often selected based on previously proven involvement 
in oncogenesis, cell cycling, DNA repair, development and other cellular processes. 
They are usually chosen to be as specific as possible for each gene and animal species. 
Human and mouse arravs are alreadv commercially available and a few companies 
will construct a personalized array to order, for example Clontech Laboratories and 
Research Genetics Inc. The technique is rapid in that hundreds or even thousands 
of genes can be spotted on a single array, and that mRNA/cDNA from the test 
populations can be labelled and used directly as probe. When analysed with 
appropriate hardware and software, arrays offer a rapid and quantitative means to 
assess differences in gene expression between two cell populations. Of course, there 
can only be identification and quantitation of those genes which are in the array 
(hence the term * closed* system). Therefore, one approach to elucidating the 
molecular mechanisms involved in a particular disease/development system may be 
to combine an open and closed system— a DNA array to directly identify and 
quantitate the expression of known genes in mRNA populations, and an open 
system such as SSH to isolate unknown genes which are differentially expressed. 

One of the main advantages of DNA arrays is the huge number of gene fragments 
which can be put on a membrane — some companies have reported gridding up to 
60000 spots on a single glass 'chip' (microscope slide). These high density chip- 
based micro-arrays will probably become available as mass-produced off-the-shelf 
items in the near future. This should facilitate the more rapid determination of 
differential expression in time and dose- response experiments. Aside from their 
high cost and the technical complexities involved in producing and probing DNA 
arrays, the main problem which remains, especially with the newer micro-array 
(gene-chip) technologies, is that results are often not wholly reproducible between 
arrays. However, this problem is being addressed and should be resolved within the 
next few years. 



. EST datahases as a means to identify differenlially.expressed genes 

Expressed sequence tags (ESTs) are partial sequences of clones obtained from 
cDNA libraries. Even though most ESTs have no formal identity (putative 
identification is the best to be hoped for), they have proven to be a rapid and efficient 
means of discovering new genes and can be- used to generate profiles of gene- 
expressiorf in specific c lis. Since theywere first described by Adams et al. (1991), 
there has be na huge explosion in EST production and it is estimated that there are 
now well over a million such sequences in the public domain, representing over half 
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of all human genes (Hillier et aL 1996). This large number of freely available 
sequences (both sequence information and clones are normally available royalty-free 
from the originators) has enabled the development of a new approach towards 
differential gene expression analysis as described by Vasmatzis et al. (1998). The 
approach is simple in theory: EST databases are first searched for genes that have a 
number of related EST sequences from the target tissue of choice, but none or few 
from non-target tissue libraries. Programmes to assist in the assembly of such sets of 
overlapping data may be developed in-house or obtained privately or from the 
internet. For example, the Institute for Genomic Research (TIGR, found at 
http://www.tigr.org) provides many software tools free of charge to the scientific 
community. Included amongst these is the TIGR assembler (Sutton et aL 1995), a 
tool for the assembly of large sets of overlapping data such as ESTs, bacterial 
artificial chromosomes (BAC)s, or small genomes. Candidate EST clones repre- 
senting different genes are then analysed using RNA blot methods for size and tissue 
specificity and, if required, used as probes to isolate and identify the full length 
cDNA clone for further characterization. In practice however, the method is rather 
more involved, requiring bioinformatic and computer analysis coupled with 
confirmatory molecular studies. Vasmatzis et aL (1998) have described several 
problems in this fledgling approach, such as separating highly homologous 
sequences derived from different genes and an overemphasis of specificity for some 
EST sequences. However, since these problems will largely be addressed by the 
development of more suitable computer algorithms and an increased completeness 
of the EST database, it is likely that this approach to identifying differentially 
expressed genes may enjoy more patronage in the future. 



Problems and potential of differential expression techniques 

The holistic or single cell approach? 

When working with in vivo models of differential expression, one of the first 
issues to consider must be the presence of multiple cell types in any given specimen. 
For example, a liver sample is likely to contain not only hepatocytes, but also 
(potentially) Ito cells, bile ductule cells, endothelial cells, various immune cells (e.g. 
lymphocytes, macrophages and Kupffer cells) and fibroblasts. Other tissues will 
each nave their own distinctive ceil popuianons. Also, in the case of neoplastic tissue, 
there are almost always normal, hyperplastic and/ or dyspiastic cells present in a 
sample. One must, therefore, be aware that genes obtained from a differential 
display experiment performed on an animal tissue model may not necessarily arise 
exclusively from the intended 1 target' cells, e.g. hepatocytes/neoplastic cells. If 
appropriate, further analyses using immunohistochemistry, in situ hybridization or 
in situ RT-PCR should be used to confirm which cell types are expressing the 
gene(s) of interest. This problem is probably most acute for those studying the 

"differential expression" of "genes in"the"devetopmenr of different cell types, where 
there is a need to examine homologous cell populations. The problem is now being 
addressed at the National Cancer Institate (Bethesda, MD, USA) where new micro- 
disection techniques have been employed to assist in their gene analysis programme, 
the Cancer Genome Anatomy Project (CGAE) { Fox. more information se web site : 
http://www.ncbi.nlm.nih.gov/ncicgap/intro.html). There are also separation tech- 

~~nfques available that utilise cell- specific ahtigens~as a means to isolate target cells, 
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e.g. fluorescence activated cell sorting (FACS) (Dunbar et al. 1998, Kas-Deelen et 
al. 1998) and magnetic bead technology (Richard et al. 1998, Rogler et al. 1998). 

However, those taking a holistic approach may consider this issue unimportant. 
There is an equally appropriate view that all those genes showing altered expression 
within a compromized tissue should be taken into consideration. After all, since all 
tissues are complex mixes of different, interacting cell rypes which intimately 
regulate each other's growth and development, it is clear that each cell type could in 
some way contribute (positively or negatively) towards the molecular mechanisms 
which lie behind responses to external stimuli or neoplastic growth. It is perhaps 
then more informative to carry out differential display experiments using in vivo as 
opposed to in vitro models, where uniform populations of identical cells probably 
represent a partial, skewed or even inaccurate picture of the molecular changes that 
occur. 

The incidence and possible implications of inter-individual biological variation 
should be considered in any approach where whole animal models are being used. It 
is clear that individuals (humans and animals) respond in different ways to identical 
stimuli. One of the best characterized examples is the debrisoquine oxidation 
polymorphism, which is mediated by cytochrome CYP2D6 and determines the 
pharmacokinetics of many commonly prescribed drugs (Lennard 1993, Meyer and 
Zanger 1997). The reasons for such differences are varied and complex, but allelic 
variations, regulatory region polymorphisms and even physical and mental health 
can all contribute to observed differences in individual responses. Careful thought 
should, therefore, be given to the specific objectives of the study and to the possible 
value of pooling starting material (tissue/mRNA). The effect of this can be 
beneficial through the ironing out of exaggerated responses and unimportant minor 
fluctuations of (mechanistically) irrelevant genes in individual animals, thus 
providing a clearer overall picture of the general molecular mechanisms of the 
response. However, at the same time such minor variations may be of utmost 
importance in deciding the ability of individual animals to succumb to or resist the 
effects of a given chemical/disease. 



How efficient are differential expression techniques at recovering a high percentage of 
differentially expressed genes ? 

A number of groups have produced experimental data suggesting that mam- 
malian cells produce between 8000-1 5 000 different mRNA species at any on time 
(Mechler and Rabbitts 1981, Hedrick et al. 1984, Bravo 1990), although figures as 
high as 20-30000 have also been quoted (Axel et al. 1976). Hedrick et al. (1984) 
provided evidence suggesting that the majority of these belong to the rare abundanc 
class. A breakdown of this abundance distribution is shown in table 1 . 
— When the results of differential-display- experimeBts -have been compared with 
data obtained previously using other methods, it is apparent that not all differentially 
expressed mRNAs are represented in the final display. In particular, rare messages 
(which, importantly, often include regulatory proteins) are not easily recov red 
using differential display systems. This is ajnajor shortc omin g, as the majority of 
mRNA species exist at levels of less than 0.005 % of the'tdtaTpopulation (table 1). 
Bertioli»e*--a/. (1995) examined- the effici ncy^f-DO templates (heterogeneous 
mRNA populations) for recovering rare messages and were unable to detect mRNA 
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species present at less than 1.2 % of the total mRNA population — equivalent to an 
intermediate or abundant species. Interestingly, when simple model systems (single 
target only) were used instead of a heterogeneous mRNA population, the same 
primers could detect levels of target mRNA down to 10000 x smaller. These results 
are probably best explained by competition for substrates from the many PCR 
products produced in a DD reaction. 

The numbers of differentially expressed mRNAs reported in the literature using 
various model systems provides further evidence that many differentially expressed 
mRNAs are not recovered. For example, DeRisi et al. (1997) used DNA array 
technology to examine gene expression in yeast following exhaustion of sugar in the 
medium, and found that more than 1700 genes showed a change in expression of at 
least 2-fold. In light of such a finding, it would not be unreasonable to suggest that 
of the 8000-1 5 000 different mRNA species produced by any given mammalian cell, 
up to 1000 or more may show altered expression following chemical stimulation. 
Whilst this may be an extreme figure, it is known that at least 100 genes are 
activated/upregulated in Jurkat (T-) cells following IL-2 stimulation (Ullman et al. 
1990). In addition, Wan et al. (1996) estimated that interferon-y-stimulated HeLa 
cells differentially express up to 433 genes (assuming 24000 distinct mRNAs 
expressed by the cells). However, there have been few publications documenting 
anywhere near the recovery of these numbers. For example, in using DD to compare 
normal and regenerating mouse liver, Bauer et al. (1993) found only 70 of 38000 
total bands to be different. Of these, 50% (35 genes) were shown to correspond to 
differentially expressed bands. Chen et al. (1996) reported 10 genes upregulated in 
female rat liver following ethinyl estradiol treatment. McKenzie and Drake (1997) 
identified 14 different gene products whose expression was altered by phorbol 
myristate acetate (PMA, a tumour promoter agent) stimulation of a human 
myelomonocytic cell line. Kilty and Vickers (1997) identified 10 different gene 
products whose expression was upregulated in the peripheral blood leukocytes of 
allergic disease sufferers. Linskens et al. (1995) found 23 genes differentially 
expressed between young and senescent fibroblasts. Techniques other than DD 
have also provided an apparent paucity of differentially expressed genes. Using SH 
for example, Cao et al. (1997) found 15 genes differentially expressed in colorectal 
cancer compared to normal mucosal epithelium. Fitzpatrick et al. (1995) isolated 17 
genes upregulated in rat liver following treatment with the peroxisome proliferator. 
clofibrate; Philips et al. (1990^ isolated 12 cDNA clones which were upregulated in 
highly metastatic mammary adenocarcinoma cell lines compared to poorly meta- 
static ones. Prashar and Weissman (1996) used 3' restriction fragment analysis and 
identified approximately 40 genes showing altered expression within 4 h of 
activation of Jurkat T-cells. Groenink and Leegwater (1996) analysed 27 gene 
fragments isolated using SSH of delayed early response phase of liver regeneration 
and found only 12 to be upregulated. 

In the laboratory, SSH was used to isolate up to 70 candidate genes which appear 
to show altered expression in guinea pig liver following short-term treatment with 
the peroxisome proliferator, WY- 14,643 (Rockett, Swales, Esdaile and Gibson, 
unpublished observations). However, these findings have still to be confirmed by 
analysis of the extracted tissue mRNA for differential expression of these sequences. 
" ' Whilst the latest differential displaytechnologieTaTe purported to include design 
_ and experimental modifications to overcom this lack oLefnciency (in both the total 
number of differentially expressed genes recovered and the percentage that are true 
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positives), .t is still not clear if such adaptations are practically effective-provi™ 
efficiency by spiking with a known amount of limited numbers of artificial 
construct(s) ,s one thing, but isolat.ng a high percentage of the rare messages already 
present m an mRNA population is another. Of course, some models will genuinely 
produce only a small number of differentially expressed genes. In addition, there are 
also technical problems that can reduce efficiency. For example, mRNAs may have 
an unusual primary structure that effectively prevents their amplification bv PCR- 
based systems^ In addition, it is known that under certain circumstances' not all 
mRNAs have 3 polyA sites. For example, during Xenopus development, deadenvl- 
ation is used as a means to stabilize RNAs (Voeltz and Steitz 1998). whilst 
preferential deadenylation may play a role in regulating Hsp70 (and perhaps 
therefore, other stress protein) expression in Drosophila (Dellavalle et al 1994) The 
presence of deadenylated mRNAs would clearly reduce the efficiency of systems 
utilizing a polydT reverse transcription step. The efficiency of anv'svstem also 
depends on the qual.ty of the starting material. All differential display t chniques 
use mRN A as their target material. However, it is difficult to isolate mRNA that is 
completely free of ribosomal RNA. Even if polydT primers are used to prime first 
strand cDN £ q synthesis, ribosomal RNA is often transcribed to some degree 
Clontech PCR-Se ect cDNA Subtraction kit user manual). It has been shown, at 
least ,n the case of SSH. that a high rRNA : mRNA ratio can lead to inefficient 
subtract.ve hybridization (Clontech PCR-Select cDNA Subtraction kit user 
manual) and there is no reason to suppose that it will not do likewise in other SH 
approaches Finally, those techniques that utilise a presubtraction amplification step 
(e.g. KDA) may present a skewed representation since some sequences amplify 
better than others. . 7 

Of course, probably the most important consideration is the temporal factor It 
is clear that any given differential display experiment can onlv interrogate a cell at 
one point in time. It may well be that a high percentage of the genes showing altered 
expression at that time are obtained. However, given that disease processes and 
responses to env.ronmental stimuli involve dynamic cascades of signalling 
regulation, production and action, it is clear that all those genes which are switched 
on/off at different times will not be recovered and. therefore, vital information may 
well be mussed. It ,s. therefore, imperative to obtain as much information about the 
model system beforehand as possible, from which a strategy can be denved for 
targeting specific time points or events that are of particular interest to the 
invesugator. One way of getting round this problem of single time point analysis is 
to conduct the expenment over a suitable time course which, of course ' adds 
substantially to the amount of work involved. 



How sensitive are differential expression technologies? 

There has been little published data that addresses the issue of how large th 
change in expression must be for it to permit isolation of the gene in question with 
the various differential expression technologies. Although the isolation of gen s 
whose expression, is changed as little as_L5-fold has been reported using SSH 
(Groemnk and I^gwater-1996)-it- a p^«ri^at- i thos demonstrating a change in 
_«cess_qf 5-fold are_in QJ e_lik e Jy_,i 0 _ ne-pir h ed u p^Thus. ther is a 'grey zone- 
in between where small changes could fade in and out of isolation between 
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experiments and animals. DD^ on the other hand, is not subject to this grev 
zone since, unlike SH approaches, it does not amplify the difference in expression 
between two samples. Wan et at. (1996) reported that differences in expression of 
twofold or more are detectable using DD. 



Resolution and visualization of differential expression products 

It seems highly improbable with current technology that a gel system could be 
developed that is able to resolve all gene species showing altered expression in any 
given test system (be it SH- or DD-based). Polyacrylamide gel electrophoresis 
(PAGE) can resolve size differences down to 0.2 ° 0 (Sambrook et al. 1989) and are 
used as standard in DD experiments. Even so, it is clear that a complex series of gene 
products such as those seen in a DD will contain unresolvable components. Thus, 
what appears to be one band in a gei may in fact turn out to be several. Indeed, it has 
been well documented (Mathieu-Daude et aL 1996, Smith et al. 1997) that a single 
band extracted from a DD often represents a composite of. heterogeneous products, 
and the same has been found for SSH displays in this laboratory (Rockett et al. 
1997). One possible solution was offered by Mathieu-Daude et al. (1996), who 
extracted and reamplified candidate bands from a DD display and used single strand 
conformation polymorphism (SSCP) analysis to confirm which components 
represented the truly differentially expressed product. 

Many scientists often try to avoid the use of PAGE where possible because it is 
technically more demanding than agarose gel electrophoresis (AGE). Unfortunately, 
high resolution agarose gels such as Metaphor (FMC, Lichfield, UK) and AquaPor 
HR (National Diagnostics, Hessle, UK), whilst easier to prepare and manipulate 
than PAGE, can only separate DNA sequences which differ in size by around 
1.5-2% (15-20 base pairs for a 1Kb fragment). Thus, SSH. RDA or other such 
products which differ in size by less than this amount are normally not resolvable. 
However, a simple technique does in fact exist for increasing the resolving power of 
AGE — the inclusion of HA-red (10-phenyl neutral red-PEG ligand) or HA-yellow 
(bisbenzamide-PEG ligand) (Hanse Analytik GmbH, Bremen, Germany) in a 
gel separates identical or closely sized products on base content. Specifically, 
HA-red and -yellow selectively bind to GC and AT DNA motifs, respectivelv 
(Wawer et al. 1995, Hanse .Analytik 1997. personal communication). Since both 
HA-stains possess an overall positive charge, they migrate towards the cathode 
when an electric field is applied. This is in direct opposition to DNA, which 
is negatively charged and, therefore, migrates towards the anode. Thus, if two 
DNA clones are identical in size (as perceived on a standard high resolution 
agarose gel), but differ in AT/GC content, inclusion of a HA-dye in the gel 
will effectively retard the migration of "one of the sequences compared to the 
other, effectively making it apparently larger and, thus, providing a means of 
differentiating between the two. The use of HA-red has been shown to resolve 
sequences with an AT variation of less than 1 % (Wawer et al. 1995), whilst Hans 
Analytik have reported that HA staining is so sensitive that in one case it was used 
to distinguish two 567bp sequences which-differed by only a single point mutation 
(Hanse Analytik 1996, personal communicati on). Therefore, if one wishes to ch ck 
whether all the clones produced from a specific band in a differential display 
-experiment -a re derived from the- same ge ne s pec ies, a small-amount of reamplified 
or digested clone can be run on a standard high resolution gel, and a second aliquot 
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Figure 10. Discrimination of clones of identical/ nearly identical size using HA-red. Bands of decreasing 
size (1-5) were extracted from the final display of a suppression subtractive hybridization 
experiment and cloned. Seven colonies were picked at random from each cloned band and their 
inserts amplified using PCR. The products were run on two gels, (A) a high resolution 2 % agarose 
gel, and (B) a high resolution 2 % agarose gel containing 1 U/ml HA-red. With few exceptions, all 
the clones from each band appear to be the same size (gel A). However, the presence of HA-red 
(gel B), which separates identically-sized DNA fragments based on the percentage of GC within 
the sequence, clearly indicates the presence of different gene species within each band. For 
example, even though all five re-amplified clones of band 1 appear to be the same size, at least four 
different gene species are represented. . 

in a similar gel containing one of the HA-stains. The standard gel should indicate 
any gross size differences, whilst the HA-stained gel should separate otherwise 
unresolvable species (on standard AGE) according to their base content. Geisinger 
et al. (1997) reported successful use of this approach for identifying DD-derived 
clones. Figure 10 shows such an experiment carried out in this laboratory on clones 
obtained from a band extracted from an SSH display. 

An alternative approach is to carry out a 2-D analysis of the differential display 
products. In this approach, size-based separation is tirst earned out in a standard 
agarose gel. The gel slice containing the display is then extracted and incorporated 
in to a HA gel for resolution based on AT/GC content. 

Of course, one should always consider the possibility of there being different 
gene species which are the same size and have the same GC/AT content. However, 
even these species are not unresolvable given some effort — again, one might use 
SSCP, or perhaps a denaturing gradient gel electrophoresis (DGGE) or temperature 
gradient field electrophoresis (TGGE) approach to resolve the contents of a band, 
either directly on the extracted band (Suzukr ef al. 1991) or on the reamplifi d 
product. 

The requirement of some differential display techniques to visualize large 
numbers of products (e.g. DD and GEF) can also present a problem in that, in terms 
of numbers, the resolution of PAGE rarely excetds 300-400 bands. One approach to 
overcoming this might be to use^-D^ds sucrTaTThose-tf escribed by Uitterlinden et 
aL (1989) and Hatada et.al. (1991). - — : . - - 
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Extraction of differentially expressed bands from a gel can be complex since, in 
some cases (e.g. DD, GEF), the results are visualized by autoradiographic means, 
such that precise overlay of the developed film on the gel must occur if the correct 
band is to be extracted for further analysis. Clearly, a misjudged extraction can 
account for many man-hours lost. This problem, and that of the use of radioisotopes, 
has been addressed by several groups. For example, -Lohmann et aL (1995) 
demonstrated that silver staining can be used directly to visualize DD bands in 
horizontal PAGs. An et al. (1996) avoided the use of radioisotopes by transferring a 
small amount (20-30%) of the DNA from their DD to a nylon membrane, and 
visualizing the bands using chemiluminescent staining before going back to extract 
the remaining DNA from the gel. Chen and Peck (1996) went one step further and 
transferred the entire DD to a nylon membrane. The DNA bands were then 
visualized using a digoxigenin (DIG) system (DIG was attached to the polydT 
primers used in the differential display procedure). Differentially expressed bands 
were cut from the membrane and the DNA eluted by washing with PCR buffer prior 
to reamplification. 

One of the advantages of using techniques such as SSH and RDA is that the final 
display can be run on an agarose gel and the bands visualized with simple ethidium 
bromide staining. Whilst this approach can provide acceptable results, overstaining 
with SYBR Green I or SYBR Gold nucleic acid stains (FMC) effectively enhances 
the intensity and sharpness of the bands. This greatly aids in their precise extraction 
and often reveals some faint products that may otherwise be overlooked. Whilst 
differential displays stained with SYBR Green I are better visualized using short 
wavelength UV (254 nm) rather than medium wavelength (306 nm), the shorter 
wavelength is much more DNA damaging. In practice, it takes only a few seconds 
to damage DNA extracted under 254 nm irradiation, effectively preventing 
reamplification and cloning. The best approach is to overstain with SYBR Green I 
and extract bands under a medium wavelength UV transillumination. 



The possible use of 'microfingerprinting' to reduce complexity 

Given the sheer number of gene products and the possible complexity of each 
band, an alternative approach to rapid characterization may be to use an enhanced 
analysis of a small section of a differential display — a 4 sub-fingerprint ' or 'micro- 
ringerprint*. In this case, one couid concentrate on those bands which oniy appear 
in a particular chosen size region. Reducing the fingerprint in this way has at least 
two advantages. One is that it should be possible to use different gel types, 
concentrations and run times tailored exactly to that region. Currently, one might 
run products from 1 00-3000 + bp on the same gel, which leads to compromize in the 
gel system being used and consequently to suboptimal resolution, both in terms of 
size and numbers, and can lead to problems in the accurate excision of individual' 
bands. Secondly, it may be possible to enhance resolution by using a 2-D analysis 
using a HA-stain, as described earlier. In summary, if a range of gene product sizes 
is carefully chosen to included certain 1 relevant ' genes, the 2-D system standardiz d, 
and appropriate gene analysis used, it may be possible to develop a method for the 
early and rapid identification of compounds which have similar or widely different 
cellular fleets. If the prognosis for xposure to one or more other chemicals which 
jdisplay a similar^ profile is already. kn own , then one cou ld perhaps predict similar 
effects for any new compounds which show a similar micro-fingerprint. 
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An alternative approach to microfingerprinting is to examine altered expression 
in specific families of genes through careful selection of PCR primers and/or post- 
reaction analysis. Stress genes, growth factors and/or their receptors, cell cycling 
genes, cytochromes P450 and regulatory proteins might be considered as candidates 
for analysis in this way. Indeed, some off-the-shelf DNA arrays (e.g. Clontech s 
Atlas cDNA Expression Array series) already anticipated this to some degree by 
grouping together genes involved in different responses e.g. apoptosis. stress, DNA- 
damage response etc. 



Screening 

False positives 

The generation of false positives has been discussed at length amongst the 
differential display community (Liang etal. 1993, 1995, NishioetaZ 1994 Sun era/ 
1994, Sompayrac et al. 1995). The reason for false positives varies with the 
technique being used. For instance, in RDA, the use of adaptors which have not 
been HPLC purified can lead to the production of false positives through illegitimate 
ligation events (O'Neill and Sinclair 1997), whilst in DD thev can arise through 
PCR artifacts and illegitemate transcription of rRNA. In SH, false positives appear 
to be derived largely from abundant gene species, although some may arise from 
cDNA/mRNA species which do not undergo hybridization for technical reasons. 

A quick screening of putative differentially expressed clones can be carried out 
using a simple dot blot approach, in which labelled first strand probes svnthesized 
from tester and driver mRNA are hybridized to an array of said clones (Hedrick et 
al. 1984, Sakaguchi et al. 1986). Differentially expressed clones will hvbridize to 
tester probe, but not driver. The disadvantage of this approach is that rare species 
may not generate detectable hybridization signals. One option for those using SSH 
is to screen the clones using a labelled probe generated from the subtracted cDNA 
from which it was derived, and with a probe made from the reverse subtraction 
reaction (ClonTechniques 1997a). Since the SSH method enriches rare sequences, 
it should be possible to confirm the presence of clones representing low abundance 
genes. Despite this quick screening step, there is still the need to go back to the 
original mRNA and confirm the altered expression using a more quantitative 
approach. Although this may be achieved using Northern blots, the sensirivitv is 
poor by today's high standards and one must rely on PCR methods for accurate and 
sensitive determinations (see below). 



Sequence analysis 

The majority of differential display procedures produce final products which ar 
between 100 and lOOObp in size. However, this may considerably reduce the siz of 
the sequence for analysis of the DNA databases. This in turn leads to a reduced 
confidence in the result — several families of genes have members whose DNA 
-sequences are almost idenrical-eyceprt u a few key s tretches; e.g. the cytochrome 
P450 gene superfamily (Nelson et : e/._1996). Thus, does the clone identified as being 
almost identical to gene X, really come" from that gene, or its brother gene X l or its 
as yet undiscovered sister X t ? FoFexample ~ using'SSH" plrt of a gene was isolated. 
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which was up-regulated in the liver of rats exposed to Wy- 14,643 and was identified 
by a FASTA search as being transferrin (data not shown). However, transferrin is 
known to be downregulated by hypolipidemic peroxisome proliferators such as Wv- 
14,643 (Hertz et al. 1996), and this was confirmed with subsequent RT-PCR 
analysis. This suggests that the gene sequence isolated may belong to a gene which 
is closely related to transferrin, but is regulated by a different mechanism. 

A further problem associated with SH technology is redundancy. In most cases 
before SH is carried out, the cDNA population must first be simplified by restriction 
digestion. This is important for. at least two reasons: 

(1) To reduce complexity— long cDNA fragments may form complex networks 
which prevent the formation of appropriate hybrids, especially at the high 
concentrations required for efficient hybridization. 

(2) Cutting the cDNAs into small fragments provides better representation of 
individual genes. This is because genes derived from related but distinct 
members of gene families often have similar coding sequences that may cross- 
hybridize and be eliminated during the subtraction procedure (Ko 1990). 
Furthermore, different fragments from the same cDNA may differ considerably 
in terms of hybridization and amplification and, thus, may not emcientiv do one 
or the other (Wang and Brown 1991). Thus, some fragments from differential^ 
expressed cDNAs may be eliminated during subtractive hybridization pro- 
cedures. However, other fragments may be enriched and isolated. As a 
consequence of this, some genes will be cut one or more times, giving rise to two 
or more fragments of different sizes. If those same genes are differentially 
expressed, then two or more of the different size fragments may come through 
as separate bands on the final differential display, increasing the observed 
redundancy and increasing the number of redundant sequencing reactions. 

Sequence comparisons also throw up another important point— at what degree 
of sequence similarity does one accept a result. Is 90% identitiy between a gene 
derived from your model species and another acceptably close? Is 95 % between 
your sequence and one from the same species also acceptable? This problem is 
particularly relevant when the forward and reverse sequence comparisons give 
similar sequences with completely different gene species! An arbitrary decision 
seems to be to allocate genes that are definite ( 95 ° n and above sirrulanrv j and then 
group those between 60 and 95 ° 0 as being related or possible homologues. 

Quantitative analysis 

At some point, one must give consideration to the quantitative analysis of the 
candidate genes, either as a means of confirming that they are truly differentially 
expressed, or in order to establish just what the differences are. Northern blot 
analysis is a popular approach as it is relatively easy and quick to perform. However, 
the major drawback with Northern blots is that they are often not sensitive enough 
to detect rare sequences. Since the majority of messages expressed in a cell are of low 
abundance (see table 1 ), this is a major problem. Consequently, RT-PCR may be th 
-method of choice- for confirrm^g-dh^eTtrnti dl eApi c sMUU. Although the procedure is 
som what more complex than Northern analysis, requiring synthesis of primers and 
optimization of reaction conditions for each gene species, it is now possible to set up 
high throughput PCR systems~usihg mulitchannel pipettes, 96 + -well plates and 
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appropriate thermal cycling technology. Whilst quantitative analysis is more 
desirable, being more accurate and without reliance on an internal standard, the 
money and time needed to develop a competitor molecule is often excessive, 
especially when one might be examining tens or even hundreds of gene species. The 
use of semi-quantitative analysis is simpler, although still relatively involved. One 
must first of all choose an internal standard that does not change in the test cells 
compared to the controls. Numerous reference genes have been tried in the past for 
example interferon-gamma (IFN-7, Frye et al. 1989), 0-actin (Heuval et al. 1994). 
glyceraldehyde-3-phosphate dehydrogenase (GAPDH, Wong et al. 1994) di- 
hydrofolate reductase (DHFR, Ivlohler and Butler 1991). 0-2-microglobulin \fi-2. 
m, Murphy et al. 1990). hypoxanthine phosphoribosyl transferase (HPRT, Foss et 
al. 1998) and a number of others (ClonTechniques 1997b). Ideallv, an internal 
standard should not change its level of expression in the cell regardless of cell age 
stage in the cell cycle or through the effects of externa! stimuli. However, it has been 
shown on numerous occasions that the levels of most housekeeping genes currently 
used by the research community do in fact change under certain conditions and in 
different tissues (ClonTechniques 1997b). It is imperative, therefore, that pre- 
liminary experiments be carried out on a panel of housekeeping genes to establish 
their suitability for use in the model system. 

Interpretation of quantitative data must also be treated with caution. By 
comparing the lists of genes identified by differential expression one can perhaps 
gain insight into why two different species react in different ways to external stimuli. 
For example, rats and mice appear sensitive to the non-genotoxic effects of a wide 
range of peroxisome proliferators whilst Syrian hamsters and guinea pigs are largely 
resistant (Orton et al. 1984, Rodricks and Turnbull 1987, Lake et al 1989 1993 
Makowska et al. 1992). A simplified approach to resolving the reason(s) why is to 
compare lists of up- and down- regulated genes in order to identify those which are 
expressed in only one species and. through background knowledge of the effects of 
the said gene, might suggest a mechanism of facilitated non-genotoxic carcinogenesis 
or protection. Of course, the situation is likely to be far more complex. Perhaps if 
there were one key gene protecting guinea pig from non-genotoxic effects and it was 
upregulated 50 times by PPs, the same gene might only be up-regulated five times 
in the Vat. However, since both were noted to be upregulated. the importance of the 
gene may be overlooked. Just to complicate matters, a laree change in expression 
does not necessarily mean a biologically mnportant change. For example, what is the 
true relevance of gene Y which shows a 50-fold increase after a particular treatment, 
and gene Z which shows only a 5-fold increase? If one examines the literature one 
may find that historically, gene Y has often been shown to be up-regulated 40-60- 
fold by a number of unrelated stimuli— in light 6?7his the 50-fold increas would 
appear less significant. However, the literature may show that gene Z has never be ri 
recorded as having more than doubled in expression— which makes your 5-fold 
increase all the more exciting. Perhaps even more interesting is if that same 5-fold 
increase has only been seen in related neoprasmsor following treatment with related 
chemicals. _ . _ 

Problems In using" the differ ntial display approach 

Differential display technology originally held. promise of an easily obtainable 
' fingerprint ' of those genes which are up- or down-regulated in test animals/cells in 
a developmental process or following exposure to given stimuli. However, it has 
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become clear that the fingerprinting process, whilst still valid, is much too complex 
to be represented by a single technique profile. This is because all differential display 
techniques have common and/or unique technical problems which preclude the 
isolation and identification of all those genes which show changes in expression. 
Furthermore, there are important genetic changes related to disease development 
which differential expression analysis is simply not designed to address. An example 
of this is the presence of small deletions, insertions, or point mutations such as those 
seen in activated oncogenes, tumour suppressor genes and individual poly- 
morphisms. Polymorphic variations, small though they usually are, are often 
regarded as being of paramount importance in explaining why some patients 
respond better than others to certain drug treatments (and, in logical extension, why 
some people are less affected by potentially dangerous xenobiotics/carcinogens than 
others). The identification of such point mutations and naruraiiy occurring 
polymorphisms requires the subsequent application of sequencing, SSCP, DGGE 
or TGGE to the gene of interest. Furthermore, differential display is not designed 
to address issues such as alternatively spliced gene species' or whether an increased 
abundance of mRNA is a result of increased transcription or increased mRNA 
stability. 



Conclusions 

Perhaps the main advantage of open system differential display techniques is that 
they are not limited by extant theories or researcher bias in revealing genes which are 
differentially expressed, since they are designed to amplify all genes which 
demonstrate altered expression. This means that they are useful for the isolation of 
previously unknown genes which may turn out be useful biomarkers of a particular 
state or condition. At least one open system (SAGE) is also quantitative, thus 
eliminating the need to return to the original mRNA and carry out Northern/PCR 
analysis to confirm the result. However, the rapid progress of genome mapping 
project^ means that over the next 5-10 years or so, the balance of experimental use 
will switch from open to closed differential display systems, particularly DNA 
arrays. Arrays are easier and faster to prepare and use, provide quantitative data, are 
suitable for high throughput analysis and can be tailored to look at specific signalling 
pathways or families of genes. Identification of all the gene sequences in human and 
common laboratory animals combined with improved DNA array technology, 
means that it will soon no longer be necessary to try to isolate differentially expressed 
= genes using the technically more demanding open system approach. Thus, their 

• main advantage (that of identifying unknown genes) will be largely eradicat d. It is 

. likely, therefore, that their sphere of application will be reduced to analysis of the 

less common laboratory species, since it will be some time yet before the genomes of 
~ such animals as zebrafish, electric eels, gerbils, crayfish and squid, for example, will 

be sequenced. 

Of course, in the end the question will always remain: What is the functional/ 
biological significance of the identified, differentially expressed genes? One 
' ~ " persistent problem is understanding -wh ther differentially expressed gen s are a 

... _ cause or consequence, of .the altered, state. Furthermore, many chemicals, such as 

non-genotoxic carcinogens, are also mitogens and so genes associated with 
- * replication will also be upregulated but may have little or nothing to do with the 
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carc.nogen.c effect Wh.lst different*! d.splay technology cannot hope to answer 
A«e quesuons, ,t does provide a springboard from wh:ch .dent.fication. regulato" 
and funct 10 nal stud.es can be launched. Understand^ the molecular m chfmsm of 
c el ular responses ,s almost .mpossible w.thout knowing the regulat.on and function 

d solav e g T ^ 7 COndit, ° n (e * mUmed) - In an abslract s — • differential 
d.splay can be l.kened to a sail photograph, showing details of a fixed moment in 
nme. Cons.der the H.stor.an who knows the outcome of a battle and the placemen" 
and cond.non of the troops before the battle commenced, but is asked To m and 
deduce how the battle progressed and why i, ended as it did from a few st"ll 
Photographs-an .mp OSS ,ble task. In order to understand the battle, the Historian 
mus find out the capabilities and motivation of the soldiers and thei command" 
officers, what the orders were and whether thev were obeved He must examfT £ 
terrain, the remains of the battle and consider the effect thf p^ev , ^3,^ 
condmons exerted Likewise, if mechanistic answers are to be fontoLng rh 
scennst must use different:* display in combination with other techniques such as 

^I Z't:: analy r ° f CCU SignaUing Pathwa ^ -"-tionlalvsis and 

«me and dose response analyse*. Although this review has emphasized the 
jmportance of d.fferentia, gene profiling, it should not be cons.dered ,n iso" d on and 

incdon a T PaCt a , PPr ° aCh bC Stren « the ^ * -ed in combination wnh 

funcuonal genom.es and proteomics (2-dimensionaI protein gels from isoelectric 

ocus.ng and subsequent SDS electrophores.s and virtual 2D-^Z^^ 
electrophores.s). Proteom.es is artract.ng much recent attention as ma^ of t£ 
changes resulting .n d.fferential gene express.on do not .nvolve changes in mRNA 

prot; ot 0 Cnl T d e r tenS,Ve,y hCrem ' but r3ther P"-n-protein. proteVoNA nd 
pro te in phosphorylat.on events which would require functional genomics or 
proteomic technologies for investigation. genomics or 

Despite the limitations of differential display technology, it is clear that manv 
potent.a appl.cat.ons and benefits can be obtained from ch^acter* ^ 
changes that occur .n a cell during normal and disease development and in respon 
protri Ca L 0r b, ° ,0g in \ U,t - In H « ht ° f fun « i0 ^ data, such profiling'" 11 
te™ should hfr^" 1 ! °, f ^ StagC ° f develo P m «» <>r response, and in the long 
P . /k , C eluC , ,datl0n of ^d sensitive biomarkers for different 

Se^eut T ^ 7 ,Cal r P0SUrC and diSCaSe Sl3tes - The *><"™* -edical a^d 
Aerapeunc benems of understanding such molecular changes are almost im- 

even.s P ecifi C type of chem.cal an mdividual has been exposed to plus the length 
and/or acuteness of that exposure, thus indicating the 1st prudent treamTem 
They may also help uncover differences in h.stolog.cally .dentical cancers P ™vTd e 
diagnosnc tests for the earliest stages of neoplasia and. agauv perhaps .nd cTe he 
most efficacious treatment. K 

The^Human Genome Project will be completed early in the next century and the 
DNA sequence of all the human genes will be known. The continuing devdopmem 
and evoluuon of d.fferent.al gene expression technology will ensure Z thi 
knowledge conmbutes fully to the understanding of human disease processes. 
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The availability of genome-scale DNA sequence information and reagents has radically altered life-science 
research. This revolution has led to the development of a new scientific subdiscipline derived from a combina- 
tion of the fields of toxicology and genomics. This subdiscipline, termed toxicogenomics, is concerned with the 
identification of potential human and environmental toxicants, and their putative mechanisms of action, through 
the use of genomics resources. One such resource is DNA microarrays or "chips," which allow the monitoring of 
the expression levels of thousands of genes simultaneously. Here we propose a general method by which gene 
expression, as measured by cDNA microarrays, can be used as a highly sensitive and informative marker for 
toxicity. Our purpose is to acquaint the reader with the development and current state of microarray technol- 
ogy and to present our view of the usefulness of microarrays to the field of toxicology. Mol. Carcinog. 24:153- 

159, 1999. © 1999 Wiley-Liss, Inc. 
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INTRODUCTION 

Technological advancements combined with in- 
tensive DNA sequencing efforts have generated an 
enormous database of sequence information over the 
past decade. To date, more than 3 million sequences, 
totaling over 2.2 billion bases [1], are contained 
within the GenBank database, which includes the 
complete sequences of 19 different organisms [2]. The 
first complete sequence of a free-living organism, 
Haemophilus influenzae, was reported in 1995 [3] and 
was followed shortly thereafter by the first complete 
sequence of a eukaryote, Saccharomyces cervisiae [4]. 
The development of dramatically improved sequenc- 
ing methodologies promises that complete elucida- 
tion of the Homo sapiens DNA sequence is not far 
behind [5]. 

To exploit more fully the wealth of new sequence 
information, it was necessary to develop novel meth- 
ods for the high-throughput or parallel monitoring 
of gene expression. Established methods such as 
northern blotting, RNAse protection assays, SI nu- 
clease analysis, plaque hybridization, and slot blots 
do not provide sufficient throughput to effectively 
utilize the new genomics resources. Newer methods 
such as differential display [6], high-density filter 
hybridization [7,8], serial analysis of gene expression 
[9], and cDNA- and oligonucleotide-based microarray 
"chip" hybridization [10-12] are possible solutions 
to this bottleneck. It is our belief that the microarray 
approach, which allows the monitoring of expres- 
sion levels of thousands of genes simultaneously, is 
a tool of unprecedented power for use in toxicology 
studies. 



Almost without exception, gene expression is al- 
tered during toxicity, as either a direct or indirect 
result of toxicant exposure. The challenge facing 
toxicologists is to define, under a given set of ex- 
perimental conditions, the characteristic and spe- 
cific pattern of gene expression elicited by a given 
toxicant. Microarray technology offers an ideal plat- 
form for this type of analysis and could be the foun- 
dation for a fundamentally new approach to 
toxicology testing. 

MICROARRAY DEVELOPMENT AND APPLICATIONS 

cDNA Microarrays 

In the past several years, numerous systems were 
developed for the construction of large-scale DNA 
arrays. All of these platforms are based on cDNAs 
or oligonucleotides immobilized to a solid sup- 
port. In the cDNA approach, cDNA (or genomic) 
clones of interest are arrayed in a multi-well for- 
mat and amplified by polymerase chain reaction. 
The products of this amplification, which are usu- 
ally 500- to 2000-bp clones from the 3' regions of 
the genes of interest, are then spotted onto solid 
support by using high-speed robotics. By using 
this method, microarrays of up to 10 000 clones 
can be generated by spotting onto a glass substrate 
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[13,14]. Sample detection for microarrays on glass 
involves the use of probes labeled with fluores- 
cent or radioactive nucleotides. 

Fluorescent cDNA probes are generated from con- 
trol and test RNA samples in single-round reverse-tran- 
scription reactions in the presence of fluorescently 
tagged dUTP (e.g., Cy3-dUTP and Cy5-dUTP), which 
produces control and test products labeled with dif- 
ferent fluors. The cDNAs generated from these two 
populations, collectively termed the "probe/' are then 
mixed and hybridized to the array under a glass cov- 
erslip [10,11,15]. The fluorescent signal is detected 
by using a custom-designed scanning confocal mi- 
croscope equipped with a motorized stage and lasers 
for fluor excitation [10,11,15]. The data are analyzed 
with custom digital image analysis software that de- 
termines for each DNA feature the ratio of fluor 1 to 
fiuur 2, corrected for local background [16,17]. The 
strength of this approach lies in the ability to label 
RNAs from control and treated samples with differ- 
ent fluorescent nucleotides, allowing for the simul- 
taneous hybridization and detection of both 
populations on one microarray. This method elimi- 
nates the need to control for hybridization between 
arrays. The research groups of Drs. Patrick Brown and 
Ron Davis at Stanford University spearheaded the 
effort to develop this approach, which has been suc- 
cessfully applied to studies of Arabidopsis thaliana 
RNA [10], yeast genomic DNA [15], tumorigenic ver- 
sus non-tumorigenic human tumor cell lines [11], 
human T-cells [18], yeast RNA [19], and human in- 
flammatory disease-related genes [20]. The most dra- 
matic result of this effort was the first published 
account of gene expression of an entire genome, that 
of the yeast Saccharomyces cervisiae [21]. 

In an alternative approach, large numbers of cDNA 
clones can be spotted onto a membrane support, al- 
beit at a lower density [7,22]. This method is useful 
for expression profiling and large-scale screening and 
mapping of genomic or cDNA clones [7,22-24]. In 
expression profiling on filter membranes, two dif- 
ferent membranes are used simultaneously for con- 
trol and test RNA hybridizations, or a single 
membrane is stripped and reprobed. The signal is 
detected by using radioactive nucleotides and visu- 
alized by phosphorimager analysis or autoradiogra- 
phy. Numerous companies now sell such c DNA 
membranes and software to analyze the image data 
[25-27]. 

Oligonucleotide Microarrays 

Oligonucleotide microarrays are constructed either 
by spotting prefabricated oligos on a glass support 
[13] or by the more elegant method of direct in situ 
oligo synthesis on the glass surface by photolithog- 
raphy [28-30]. The strength of this approach lies in 
its ability to discriminate DNA molecules based on 
single base-pair difference. This allows the applica- 
tion of this method to the fields of medical diagnos- 



tics, pharmacogenetics, and sequencing by hybrid- 
ization as well as gene-expression analysis. 

Fabrication of oligonucleotide chips by photoli- 
thography is theoretically simple but technically 
complex [29,30]. The light from a high-intensity 
mercury lamp is directed through a photolitho- 
graphic mask onto the silica surface, resulting in 
deprotection of the terminal nucleotides in the illu- 
minated regions. The entire chip is then reacted with 
the desired free nucleotide, resulting in selected chain 
elongation. This process requires only 4n cycles 
(where n = oligonucleotide length in bases) to syn- 
thesize a vast number of unique oligos, the total num- 
ber of which is limited only by the complexity of the 
photolithographic mask and the chip size [29,31,32]. 

Sample preparation involves the generation of 
double-stranded cDNA from cellular poly(A)+ RNA 
followed by antisense RNA synthesis in an in vitro 
transcription reaction with biotinylated or fluor- 
tagged nucleotides. The RNA probe is then frag- 
mented to facilitate hybridization. If the indirect 
visualization method is used, the chips are incubated 
with fluor-linked streptavidin (e.g., phycoerythrin) 
after hybridization [12,33]. The signal is detected with 
a custom confocal scanner [34]. This method has 
been applied successfully to the mapping of genomic 
library clones [35], to de novo sequencing by hybrid- 
ization [28,36], and to evolutionary sequence com- 
parison of the BRCA1 gene [37]. In addition, 
mutations in the cystic fibrosis [38] and BRCA1 [39] 
gene products and polymorphisms in the human im- 
munodeficiency virus-1 clade B protease gene [40] 
have been detected by this method. Oligonucleotide 
chips are also useful for expression monitoring [33] 
as has been demonstrated by the simultaneous evalu- 
ation of gene-expression patterns in nearly all open 
reading frames of the yeast strain S. cerevisiae [12]. 
More recently, oligonucleotide chips have been used 
to help identify single nucleotide polymorphisms in 
the human [41] and yeast [42] genomes. 

THE USE OF MICROARRAYS IN TOXICOLOGY 

Screening for Mechanism of Action 

The field of toxicology uses numerous in vivo 
model systems, including the rat, mouse, and rab- 
bit, to assess potential toxicity and these bioassays 
are the mainstay of toxicology testing. However, in 
the past several decades, a plethora of in vitro tech- 
niques have been developed to measure toxicity, 
many of which measure toxicant-induced DNA dam- 
age. Examples of these assays include the Ames test, 
the Syrian hamster embryo cell transformation as- 
say, micronucleus assays, measurements of sister 
chromatid exchange and unscheduled DNA synthe- 
sis, and many others. Fundamental to all of these 
methods is the fact that toxicity is often preceded 
by, and results in, alterations in gene expression. In 
many cases, these changes in gene expression are a 
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far more sensitive, characteristic, and measurable 
endpoint than the toxicity itself. We therefore pro- 
pose that a method based on measurements of the 
genome-wide gene expression pattern of an organ- 
ism after toxicant exposure is fundamentally infor- 
mative and complements the established methods 
described above. 

We are developing a method by which toxicants 
can be identified and their putative mechanisms of 
action determined by using toxicant-induced gene ex- 
pression profiles. In this method, in one or more de- 
fined model systems, dose and time-course parameters 
are established for a series of toxicants within a given 
prototypic class (e.g., polycyclic aromatic hydrocar- 
bons (PAHs)). Cells are then treated with these agents 
at a fixed toxicity level (as measured by cell survival), 
RNA is harvested, and toxicant-induced gene expres- 
sion changes are assessed by hybridization to a cDNA 
microarray chip (Figure 1). We have developed a cus- 
tom DNA chip, called ToxChip vl.O, specifically for 
this purpose and will discuss it in more detail below. 
The changes in gene expression induced by the test 
agents in the model systems are analyzed, and the 
common set of changes unique to that class of toxi- 
cants, termed a toxicant signature, is determined. 

This signature is derived by ranking across all ex- 
periments the gene-expression data based on rela- 

Control 
Population 



tive fold induction or suppression of genes in treated 
samples versus untreated controls and selecting the 
most consistently different signals across the sample 
set. A different signature may be established for each 
prototypic toxicant class. Once the signatures are de- 
termined, gene-expression profiles induced by un- 
known agents in these same model systems can then 
be compared with the established signatures. A match 
assigns a putative mechanism of action to the test 
compound. Figure 2 illustrates this signature method 
for different types of oxidant stressors, PAHs, and 
peroxisome prolif era tors. In this example, the un- 
known compound in question had a gene-expres- 
sion profile similar to that of the oxidant stressors in 
the database. We anticipate that this general method 
will also reveal cross talk between different pathways 
induced by a single agent (e.g., reveal that a com- 
pound has both PAH-hke and oxidant-like proper- 
ties). In the future, it may be necessary to distinguish 
very subtle differences between compounds within 
a very large sample set (e.g., thousands of highly simi- 
lar structural isomers in a combinatorial chemistry 
library or peptide library). To generate these highly 
refined signatures, standard statistical clustering tech- 
niques or principal-component analysis can be used. 

For the studies outlined in Figure 2, we developed 
the custom cDNA microarray chip ToxChip vl.O. 
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Figure 1. Simplified overview of the method for sample trative purposes, samples derived from cell culture are depicted, 
preparation and hybridization to cDNA microarrays. For illus- although other sample types are amenable to this analysis. 
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Figure 2. Schematic representation of the method for iden- 
tification of a toxicant's mechanism of action. In this method, 
gene-expression data derived from exposure of model sys- 
tems to known toxicants are analyzed, and a set of changes 
characteristic to that type of toxicant (termed the toxicant 
signature) is identified. As depicted, oxidant stressors produce 



consistent changes in group A genes (indicated by red and 
green circles), but not group B or C genes (indicated by gray 
circles). The set of gene-expression changes elicited by the 
suspected toxicant is then compared with these characteristic 
patterns, and a putative mechanism of action is assigned to 
the unknown agent. 



The 2090 human genes that comprise this subarray 
were selected for their well-documented involve- 
ment in basic cellular processes as well as their re- 
sponses to different types of toxic insult. Included 
on this list are DNA replication and repair genes, 
apoptosis genes, and genes responsive tq PAHs and 
dioxin-like compounds, peroxisome proliferators, 
estrogenic compounds, and oxidant stress. Some of 
the other categories of genes include transcription 
-factors, oncogenes, tumor suppressor genes, cyclins, 
kinases, phosphatases, cell adhesion and motility 
genes, and homeobox genes. Also included in this 
group are 84 housekeeping genes, whose hybridiza- 
tion intensity is averaged and used for signal nor- 
malization of the other genes on the chip. To date, 
very few toxicants have been shown to have appre- 
ciable effects on the expression of these housekeep- 
ing genes. However, this housekeeping list will be 
revised if new data warrant the addition or deletion 
of a particular gene. Table 1 contains a general de- 
scription of some of the different classes of genes 
that comprise ToxChip vl.O. 

When a toxicant signature is determined, the 
genes within this signature are flagged within the 
database. When uncharacterized toxicants are then 
screened, the data can be quickly reformatted so that 
blocks of genes representing the different signatures 



are displayed [11]. This facilitates rapid, visual in- 
terpretation of data. We are also developing Tox- 
Chip v2.0 and chips for other model systems, 
including rat, mouse, Xenopus, and yeast, for use in 
toxicology studies. 

Animal Models in Toxicology Testing 

The toxicology community relies heavily on the 
use of animals as model systems for toxicology test- 
ing. Unfortunately, these assays are inherently ex- 
pensive, require large numbers of animals and take a 
long time to complete and analyze. Therefore, the 
National Institute of Environmental Health Sciences 
(NIEHS), the National Toxicology Program, and the 
toxicology community at large are committed to re- 
ducing the number of animals used, by developing 
more efficient and alternative testing methodologies. 
Although substantial progress has been made in the 
development of alternative methods, bioassays are 
still used for testing endpoints such as neurotoxic- 
ity, immunotoxicity, reproductive and developmen- 
tal toxicology, and genetic toxicology. The rodent 
cancer bioassay is a particularly expensive and time- 
consuming assay, as it requires almost 4 yr, 1200 
animals, and millions of dollars to execute and ana- 
lyze [43]. In vitro experiments of the type outlined 
in Figure 2 might provide evidence that an unknown 
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Table 1. ToxChip v1.0: A Human cDNA Microarray 
Chip Designed to Detect Responses to Toxic Insult 

No. of genes 



Gene category on chip 



Apoptosis 72 

DNA replication and repair 99 

Oxidative stress/redox homeostasis 90 

Peroxisome proliferator responsive 22 

Dioxin/PAH responsive 12 

Estrogen responsive 63 

Housekeeping 84 

Oncogenes and tumor suppressor genes 76 

Cell-cycle control 51 

Transcription factors 1 3 1 

Kinases 276 

Phosphatases 88 

Heat-shock proteins 23 

Receptors 349 

Cytochrome P450s 30 



*This list rs intended as a general guide. The gene categories are not 
unique, and some genes are listed in multiple categories. 

agent is (or is not) responsible for eliciting a given 
biological response. This information would help to 
select a bioassay more specifically suited to the agent 
in question or perhaps suggest that a bioassay is not 
necessary, which would dramatically reduce cost, 
animal use, and time. 

The addition of microarray techniques to stan- 
dard bioassays may dramatically enhance the sen- 
sitivity and interpretability of the bioassay and 
possibly reduce its cost. Gene-expression signatures 
could be determined for various types of tissue-spe- 
cific toxicants, and new compounds could be 
screened for these characteristic signatures, provid- 
ing a rapid and sensitive in vivo test. Also, because 
gene expression is often exquisitely sensitive to low 
doses of a toxicant, the combination^of gene-expres- 
sion screening and the bioassay might allow the use 
of lower toxicant doses, which are more relevant to 
human exposure levels, and the use of fewer ani- 
mals. In addition, gene-expression changes are nor- 
mally measured in hours or days, not in the months 
to years required for tumor development. Further- 
more, microarrays might be particularly useful for 
investigating the relationship between acute and 
chronic toxicity and identifying secondary effects 
of a given toxicant by studying the relationship 
between the duration of exposure to a toxicant and 
the gene-expression profile produced. Thus, a bio- 
assay that incorporates gene-expression signatures 
with traditional endpoints might be substantially 
shorter, use more realistic dose regimens, and cost 
substantially less than the current assays do. 

These considerations are also relevant for branches 
of toxicology not related to human health and not 
using rodents as model systems, such as aquatic toxi- 
cology and plant pathology. Bioassays based on the 
flathead minnow, Daphnia, and Arabadopsis could 



also be improved by the addition of microarray analy- 
sis. The combination of microarrays with traditional 
bioassays might also be useful for investigating some 
of the more intractable problems in toxicology re- 
search, such as the effects of complex mixtures and 
the difficulties in cross-species extrapolation. 

Exposure Assessment, Environmental Monitoring, 
and Drug Safety 

The currently used methods for assessment of ex- 
posure to chemical toxicants are based on measure- 
ment of tissue toxin levels or on surrogate markers 
of toxicity, termed biomarkers (e.g., peripheral blood 
levels of hepatic enzymes or DNA adducts). Because 
gene expression is a sensitive endpoint, gene expres- 
sion as measured with microarray technology may 
be useful as a new biomarker to more precisely iden- 
tify hazards and to assess exposure. Similarly, 
microarrays could be used in an environmental- 
monitoring capacity to measure the effect of poten- 
tial contaminants on the gene-expression profiles 
of resident organisms. In an analogous fashion, 
microarrays could be used to measure gene-expres- 
sion endpoints in subjects in clinical trials. The com- 
bination of these gene-expression data and more 
established toxic endpoints in these trials could be 
used to define highly precise surrogates of safety. 

Gene-expression profiles in samples from exposed 
individuals could be compared to the profiles of the 
same individuals before exposure. From this infor- 
mation, the nature of the toxic exposure can be de- 
termined or a relative clinical safety factor estimated. 
In the future it may also be possible to estimate not 
only the nature but the dose of the toxicant for a 
given exposure, based on relative gene-expression 
levels. This general approach may be particularly 
appropriate for occupational-health applications, in 
which unexposed and exposed samples from the 
same individuals may be obtainable. For example, 
a pilot study of gene expression in peripheral-blood 
lymphocytes of Polish coke-oven workers exposed 
to PAHs (and many other compounds) is under con- 
sideration at the NIEHS. An important consideration 
for these types of studies is that gene expression can 
be affected by numerous factors, including diet, 
health, and personal habits. To reduce the effects 
of these confounding factors, it may be necessary 
to compare pools of control samples with pools of 
treated samples. In the future it may be possible to 
compare exposed sample sets to a national database 
of human-expression data, thus eliminating the 
need to provide an unexposed sample from the same 
individual. Efforts to develop such a national gene- 
expression database are currently underway [44,45]. 
However, this national database approach will re- 
quire a better understanding of genome-wide gene 
expression across the highly diverse human popu- 
lation and of the effects of environmental factors 
on this expression. 
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Alleles, Oligo Arrays, and Toxicogenetics 

Gene sequences vary between individuals, and 
this variability can be a causative factor in human 
diseases of environmental origin [46,47]. A new area 
of toxicology, termed toxicogenetics, was recently 
developed to study the relationship between genetic 
variability and toxicant susceptibility. This field is 
not the subject of this discussion, but it is worth- 
while to note that the ability of oligonucleotide ar- 
rays to discriminate DNA molecules based on single 
base-pair differences makes these arrays uniquely 
useful for this type of analysis. Recent reports dem- 
onstrated the feasibility of this approach [41,42]. 
The NIEHS has initiated the Environmental Genome 
Project to identify common sequence polymor- 
phisms in 200 genes thought to be involved in en- 
vironmental diseases [48]. In a pilot study on the 
feasibility of this application to the Environmental 
Genome Project, oligonucleotide arrays will be used 
to resequence 20 candidate genes. This toxicogenetic 
approach promises to dramatically improve our un- 
derstanding of interindividual variability in disease 
susceptibility. 

FUTURE PRIORITIES 

There are many issues that must be addressed be- 
fore the full potential of microarrays in toxicology 
research can be realized. Among these are model sys- 
tem selection, dose selection, and the temporal na- 
ture of gene expression. In other words, in which 
species, at what dose, and at what time do we look 
for toxicant-induced gene expression? If human 
samples are analyzed, how variable is global gene 
expression between individuals, before and after toxi- 
cant exposure? What are the effects of age, diet, and 
other factors on this expression? Experience, in the 
form of large data sets of toxicant exposures, will 
answer these questions. 

One of the most pressing issues for array scientists 
is the construction of a national public database 
(linked to the existing public databases) to serve as a 
repository for gene-expression data. This relational 
database must be made available for public use, and 
researchers must be encouraged to submit their ex- 
pression data so that others may view and query the 
information. Researchers at the National Institutes 
of Health have made laudable progress in develop- 
ing the first generation of such a database [44,45], In 
addition, improved statistical methods for gene clus- 
tering and pattern recognition are needed to ana- 
lyze the data in such a public database. 

The proliferation of different platforms and meth- 
ods for microarray hybridizations will improve 
sample handling and data collection and analysis and 
reduce costs. However, the variety of microarray 
methods available will create problems of data com- 
patibility between platforms. In addition, the near- 
infinite variety of experimental conditions under 



which data will be collected by different laborato- 
ries will make large-scale data analysis extremely dif- 
ficult. To help circumvent these future problems, a 
set of standards to be included on all platforms 
should be established. These standards would facili- 
tate data entry into the national database and serve 
as reference points for cross-platform and inter-labo- 
ratory data analysis. 

Many issues remain to be resolved, but it is clear 
that new molecular techniques such as microarray 
hybridization will have a dramatic impact on toxicol- 
ogy research. In the future, the information gathered 
from microarray-based hybridization experiments will 
form the basis for an improved method to assess the 
impact of chemicals on human and environmental 
health. 
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Abstract 

Recent progress in genomics and proteomics technologies has created a unique opportunity to significantly impact 
the pharmaceutical drug development processes. The perception that cells and whole organisms express specific 
inducible responses to stimuli such as drug treatment implies that unique expression patterns, molecular fingerprints, 
indicative of a drug's efficacy and potential toxicity are accessible. The integration into state-of-the-art toxicology of 
assays allowing one to profile treatment-related changes in gene expression patterns promises new insights into 
mechanisms of drug action and toxicity. The benefits will be improved lead selection, and optimized monitoring of 
drug efficacy and safety in pre-clinical and clinical studies based on biologically relevant tissue and surrogate markers. 
© 2000 Elsevier Science Ireland Ltd. All rights reserved. 
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1. Introduction 

The majority of drugs act by binding to protein 
targets, most to known proteins representing en- 
zymes, receptors and channels, resulting in effects 
such as enzyme inhibition and impairment of 
signal transduction. The treatment-induced per- 
turbations provoke feedback reactions aiming to 
compensate for the stimulus, which almost always 
are associated with signals to the nucleus, result- 
ing in altered gene expression. Such gene expres- 
sion regulations account for both the 
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pharmacological action and the toxicity of a drug 
and can be visualized by either global mRNA or 
global protein expression profiling. Hence, for 
each individual drug, a characteristic gene regula- 
tion pattern, its molecular fingerprint, exists 
which bears valuable information on its mode of 
action and its mechanism of toxicity. 

Gene expression is a multistep process that 
results in an active protein (Fig. 1). There exist 
numerous regulation systems that exert control at 
and after the transcription and the translation 
step. Genomics, by definition, encompasses the 
quantitative analysis of transcripts at the mRNA 
level, while the aim of proteomics is to quantify 
gene expression further down-stream, creating a 
snapshot of gene regulation closer to ultimate cell 
function control. 
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2. Global mRNA profiling 

Expression data at the mRNA level can be 
produced using a set of different technologies 
such as DNA microarrays, reverse transcript 
imaging, amplified fragment length polymorphism 
(AFLP), serial analysis of gene expression 
(SAGE) and others. Currently, DNA microarrays 
are very popular and promise a great potential. 
On a typical array, each gene of interest is repre- 
sented either by a long DNA fragment (200-2400 
bp) typically generated by polymerase chain reac- 
tion (PCR) and spotted on a suitable substrate 
using robotics (Schena et al., 1995; Shalon et aL, 
1996) or by several short oligonucleotides (20-30 
bp) synthesized directly onto a solid support using 
photolabile nucleotide chemistry (Fodor et al., 
1991; Chee et al., 1996). From control and treated 
tissues, total RNA or mRNA is isolated and 
reverse transcribed in the presence of radioactive 
or fluorescent labeled nucleotides, and the labeled 
probes are then hybridized to the arrays. The 
intensity of the array signal is measured for each 
gene transcript by either autoradiography or laser 
scanning confocal microscopy. The ratio between 
the signals of control and treated samples reflect 
the relative drug-induced change in transcript 
abundance. 
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3. Global protein profiling 

Global quantitative expression analysis at the 
protein level is currently restricted to the use of 
two-dimensional gel electrophoresis. This tech- 
nique combines separation of tissue proteins by 
isoelectric focusing in the first dimension and by 
sodium dodecyl sulfate slab gel electrophoresis- 
based molecular weight separation on the second, 
orthogonal dimension (Anderson et aL, 1991). 
The product is a rectangular pattern of protein 
spots that are typically revealed by Coomassie 
Blue, silver or fluorescent staining (Fig. 2). 
Protein spots are identified by mass spectrometry 
following generation of peptide mass fingerprints 
(Mann et al., 1993) and sequence tags (Wilkins et 
al., 1996). Similar to the mRNA approach, the 
ratio between the optical density of spots from 
control and treated samples are compared to 
search for treatment-related changes. 

4. Expression data analysis 

Bioinformatics forms a key element required to 
organize, analyze and store expression data from 
either source, the mRNA or the protein level. The 
overall objective, once a mass of high-quality 
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Fig. 2. Computerized representation of a Coomassie Blue stained two-dimensional gel electrophoresis pattern of Fischer F344 rat 
liver homogenate. 



quantitative expression data has been collected, is 
to visualize complex patterns of gene expression 
changes, to detect pathways and sets of genes 
tightly correlated with treatment efficacy and toxi- 
city, and to compare the effects of different sets of 
treatment (Anderson et al, 1996). As the drug 
effect database is growing, one may detect similar- 
ities and differences between the molecular finger- 
prints produced by various drugs, information 
that may be crucial to make a decision whether to 
refocus or extend the therapeutic spectrum of a 
drug candidate. 



5. Comparison of global mRNA and protein 
expression profiling 

There are several synergies and overlaps of data 
obtained by mRNA and protein expression analy- 
sis. Low abundant transcripts may not be easily 
quantified at the protein level using standard two- 
dimensional gel electrophoresis analysis and their 
detection may require prefractionation of sam- 
ples. The expression of such genes may be prefer- 
ably quantified at the mRNA level using 
techniques allowing PCR-mediated target amplifi- 
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cation. Tissue biopsy samples typically yield good 
quality of both mRNA and proteins; however, the 
quality of mRNA isolated from body fluids is 
often poor due to the faster degradation of 
mRNA when compared with proteins. RNA sam- 
ples from body fluids such as serum or urine are 
often not very 'meaningful', and secreted proteins 
are likely more reliable surrogate markers for 
treatment efficacy and safety. Detection of post- 
translational modifications, events often related to 
function or nonfunction of a protein, is restricted 
to protein expression analysis and rarely can be 
predicted by mRNA profiling. Information on 
subcellular localization and translocation of 
proteins has to be acquired at the ievel of the 
protein in combination with sample prefractiona- 
tion procedures. The growing evidence of a poor 
correlation between mRNA and protein abun- 
dance (Anderson and Seilhamer, 1997) further 
suggests that the two approaches, mRNA and 
protein profiling, are complementary and should 
be applied in parallel. 

6. Expression profiling and drug development 

Understanding the mechanisms of action and 
toxicity, and being able to monitor treatment 
efficacy and safety during trials is crucial for the 
successful development of a drug. Mechanistic 
insights are essential for the interpretation of drug 
effects and enhance the chances of recognizing 
potential species specificities contributing to an 
improved risk profile in humans (kichardson et 
al., 1993; Steiner et al., 1996b; Aicher et al., 1998). 
The value of expression profiling further increases 
when links between treatment-induced expression 
profiles and specific pharmacological and toxic 
endpoints are established (Anderson et ah, 1991, 
1995, 1996; Steiner et al. 1996a). Changes in gene 
expression are known to precede the manifesta- 
tion of morphological alterations, giving expres- 
sion profiling a great potential for early 
compound screening, enabling one to select drug 
candidates with wide therapeutic windows 
reflected by molecular fingerprints indicative of 
high pharmacological potency and low toxicity 
(Arce et al., 1998). In later phases of drug devel- 



opment, surrogate markers of treatment efficacy 
and toxicity can be applied to optimize the moni- 
toring of pre-clinical and clinical studies (Doherty 
et al., 1998). 



7. Perspectives 

The basic methodology of safety evaluation has 
changed little during the past decades. Toxicity in 
laboratory animals has been evaluated primarily 
by using hematological, clinical chemistry and 
histological parameters as indicators of organ 
damage. The rapid progress in genomics and pro- 
teomics technologies creates a unique opportunity 
to dramatically improve the predictive power of 
safety assessment and to accelerate the drug devel- 
opment process. Application of gene and protein 
expression profiling promises to improve lead se- 
lection, resulting in the development of drug can- 
didates with higher efficacy and lower toxicity. 
The identification of biologically relevant surro- 
gate markers correlated with treatment efficacy 
and safety bears a great potential to optimize the 
monitoring of pre-clinical and clinical trails. 
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